Conversation
| - g4dn.metal | ||
| - g5.48xlarge | ||
| - hpc6a.48xlarge | ||
| - hpc7g.16xlarge |
There was a problem hiding this comment.
V - lets update with all hpc7g types here since they all have EFA. Add hpc7g.8xlarge, hpc7g.4xlarge.
There was a problem hiding this comment.
Can do! I didn’t want to add them before being able to test. Any updates on if it might be possible to have an ARM build for the plugin?
There was a problem hiding this comment.
Done! And per our discussion, I can offer to do a PR to add these images here https://github.com/aws-samples/efa-device-plugin-helm/blob/main/aws-efa-k8s-device-plugin after we finish up here.
There was a problem hiding this comment.
okay opened, albeit we should not merge it! aws-samples/efa-device-plugin-helm#1
|
This is working now and can be reviewed. I changed nothing, and I have no idea why it's working. I think it might be related to the aws metadata (describe-instances) that sets conditions for the efa / network devices. If it wasn't completed/ready on the first tries, maybe that could lead to this outcome? |
994a2d2 to
2adbe41
Compare
|
Why was this closed? |
@vsoch, apologies again, it was closed by the stale bot. I have reopened it now and applied a label that should prevent it from being automatically closed. Please give us some time to review it, the team is occupied with other deliverables. |
|
Thank you! |
|
Please re-open again, thank you! |
2adbe41 to
85112d8
Compare
|
Hi @cPu1 could you please give feedback to the CI errors? I'm seeing them show up in other PRs and it looks to be that an incorrect function signature is being used, for example: GetOutpostInstanceTypes(context.backgroundCtx,*outposts.GetOutpostInstanceTypesInput,func(*outposts.Options))
0: context.backgroundCtx{emptyCtx:context.emptyCtx{}}
1: &outposts.GetOutpostInstanceTypesInput{OutpostId:(*string)(0xc0000542c0), MaxResults:(*int32)(nil), NextToken:(*string)(nil), noSmithyDocumentSerde:document.NoSerde{}}
2: (func(*outposts.Options))(0x7ec320)
The closest call I have is:
GetOutpostInstanceTypes(string,string)
0: "mock.Anything"
1: "mock.Anything"and notably we don't touch the relevant code here. Is there another PR that is fixing these CI issues we should watch? Thanks! |
|
@vsoch could you please rebase with |
Signed-off-by: sochat1 <[email protected]>
Signed-off-by: sochat1 <[email protected]>
Signed-off-by: vsoch <[email protected]>
85112d8 to
f95076b
Compare
|
All set! |
Description
Problem: The new hpc7g images use the graviton2 processor (arm) but are not detected as such by eskctl. In addition, as we have been discussing in #6222, the daemon set for the efa device driver does not work with
runAsNonRootset to true. I believe this tweak is close, however the final step (I think) needs to also be to provide an ARM build for the driver itself. I believe this is proprietary code, so I wanted to ask here first about that. I was able to figure out the container entrypoint and output, in case that helps:Note that online examples for efa (e.g., this repository) is not exactly what we want - the Dockerfile will build a container that can use EFA but not one that has that particular executable.
Let me know how you would like to proceed!