Skip to content

Conversation

@eytan-avisror
Copy link
Collaborator

@eytan-avisror eytan-avisror commented May 19, 2021

Signed-off-by: Eytan Avisror [email protected]
Fixes #306

This PR uses the payload from DescribeCluster to feed CA & API endpoint directly to userdata in order to avoid making those calls and potentially getting throttled on user-data.

This also fixes the argument passing in the case of Windows so that we wrap the Kubelet arguments same as we do for Linux

TBD

  • Test Linux AMI
  • Test Windows AMI

Signed-off-by: Eytan Avisror <[email protected]>
@eytan-avisror eytan-avisror requested review from a team as code owners May 19, 2021 17:32
Signed-off-by: Eytan Avisror <[email protected]>
@codecov
Copy link

codecov bot commented May 19, 2021

Codecov Report

Merging #309 (0912888) into master (6a04c23) will increase coverage by 0.11%.
The diff coverage is 100.00%.

❗ Current head 0912888 differs from pull request most recent head 196896f. Consider uploading reports for the commit 196896f to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #309      +/-   ##
==========================================
+ Coverage   70.05%   70.17%   +0.11%     
==========================================
  Files          19       19              
  Lines        3156     3168      +12     
==========================================
+ Hits         2211     2223      +12     
  Misses        804      804              
  Partials      141      141              
Impacted Files Coverage Δ
controllers/provisioners/eks/helpers.go 91.63% <100.00%> (+0.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6a04c23...196896f. Read the comment docs.

sb.WriteString("--use-max-pods false ")
}
if state.Cluster != nil {
sb.WriteString(fmt.Sprintf("--b64-cluster-ca %v ", aws.StringValue(state.Cluster.CertificateAuthority.Data)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some tricky behavior in the EKS AL2 AMI that we might want to solve for before merging this change.
See https://github.com/awslabs/amazon-eks-ami/blob/master/files/bootstrap.sh#L332

Basically, if the cluster ca and api server endpoint are defined, the AMI bypasses calling EKS - which is what we want! Unfortunately, that call also grabs the serviceIpv4Cidr field, which is used later in the bootstrap script to calculate the default ip that should be used for DNS (the kube-dns/core-dns service ip basically). We probably want to replicate this logic and inject the dns ip address in case the user is relying on this behavior.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, will look into this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@backjo take a look again, I added discovery of cluster ip in the case of AL2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look pretty good! If I might suggest one more change - looking up the IPv4 CIDR range of the node that IM is running on adds a somewhat implicit assumption that Instance Manager is running in the same VPC - which I think is generally a fair assumption, though it could make debugging with a remote cluster from an IDE a bit more difficult :P

I think a cleaner approach to determine the VPC CIDR range for the nodes might be to lookup the VPC attached to the EKS cluster - and use the CIDR range associated with it for the fallback logic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.. technically we can remove the fallback all together, since reconcile depends on getting cluster information.
I think it's safe to assume we will always have the ServiceCIDR and remove the metadata fallback logic. WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with it - the only open question I have is whether ServiceCIDR is always defined - I know that it was only added in October 2020 - and all of our clusters are newer than that - so could use some help verifying that even for older clusters, the value is defined.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm I have clusters from 2019 which have this populated.
Makes sense when they add a new API to make sure it's populated across all objects otherwise this would be impossible to use

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also removed the ec2metadata discovery fallback since we guarantee to always have the cluster payload or fail the reconcile.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool - sounds good!

@eytan-avisror eytan-avisror merged commit ea5c8ae into keikoproj:master Jun 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API Retries on bootstrap.sh

2 participants