Conversation
|
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
|
The These appear to be golang/go#75148, which should be fixable when golang/go#74630 is implemented. However, in order to upgrade to Go 1.25.1 now, we'll need to find a workaround. |
These errors are coming from Go downloading dependencies before executing the tests. The errors can be simulated like so: So we probably just need to download the dependencies explicitly, ensuring that |
I've implemented this approach in this PR and it has helped. However, now CI is failing with this odd error which seems unrelated to FIPS in any way. |
d8c500d to
0b69f33
Compare
Turns out this is a change in behavior in Go 1.25: https://tip.golang.org/doc/go1.25#change-to-unhandled-panic-output. Addressed in 46cc036. |
|
Looking at the latest build I see a couple of strange things (maybe some of those were already there and didn't notice until now)
|
|
Windows build steps are failing in CI on this PR. See a lot of |
4db5a4a to
1ceaae8
Compare
|
52d2f67 to
b6b1a81
Compare
|
This pull request is now in conflicts. Could you fix it? 🙏 |
b6b1a81 to
92a9716
Compare
a41a670 to
eaf3ce7
Compare
|
This pull request is now in conflicts. Could you fix it? 🙏 |
|
Windows builds are failing like so: |
|
Looking at most of the remaining CI failures, it's consistently Win 2025 and uninstall of Agent: Unclear at this time if this is still related to golang/go#77402. |
|
We're now getting |
684430e to
2da3431
Compare
2da3431 to
83ca6d4
Compare
|
buildkite test this |
|
I fixed the remaining issues stemming from the upgrade, albeit one of them is a revert to 1.24 behavior as we investigate alternatives:
|
|
@cmacknz @michel-laterman can you review the above changes? I'd like to get more eyes on this before we merge. |
michel-laterman
left a comment
There was a problem hiding this comment.
Thanks for figuring this out!
| // Windows use NtCreateFile with DELETE access — these are stricter about file | ||
| // state and fail on files that have been ADS-renamed. The simple path-based | ||
| // approach using os.Remove works correctly with the ADS rename trick that | ||
| // RemovePath uses to delete running executables on Windows. |
There was a problem hiding this comment.
Overall, what's the plan here, are we going to address the changes in file permissions or do something else? Do we have a follow-up issue?
There was a problem hiding this comment.
We don't have an issue, and I'm not sure what the actual root cause is. Just that the new RemoveAll implementation doesn't like the data stream renames and naive attempts to fix it don't really help. The errors aren't very informative either, you just get Access Denied.
I will file an issue describing the problem and what I tried to fix it. For now, I want to unblock this upgrade because it blocks a bunch of other updates.
There was a problem hiding this comment.
Yes at this point, let's fall back to the implementation we know works and work out how to get rid of this separately. This is blocking the collector and contrib dependency updates for example, in addition to use not getting CVE fixes in Go anymore.
Co-authored-by: Michel Laterman <82832767+michel-laterman@users.noreply.github.com>
| // We also set GODEBUG=tlsmlkem=0 to disable the X25519MLKEM768 TLS key | ||
| // exchange mechanism; without this setting and with the GODEBUG=fips140=only | ||
| // setting, we get errors in tests like so: | ||
| // Failed to connect: crypto/ecdh: use of X25519 is not allowed in FIPS 140-only mode |
There was a problem hiding this comment.
Where are we using this? The FIPS mode is supposed to default you to only using FIPS compatible parts of TLS without us having to do anything. Are we explicitly testing a non-FIPS mechanism somewhere?
There was a problem hiding this comment.
That, I don't really know, Shaunak is the FIPS expert and he handled that part. I say we merge as-is and he can revisit this after he's back.
There was a problem hiding this comment.
=== FAIL: internal/pkg/remote TestClientWithCertificate/fips_invalid_key_fips140only (0.00s)
client_fips_test.go:186:
Error Trace: /Users/cmackenzie/go/src/github.com/elastic/elastic-agent/internal/pkg/remote/client_fips_test.go:186
Error: "all hosts failed: requester 0/1 to host https://127.0.0.1:64269/ errored: Get \"https://127.0.0.1:64269/echo-hello?\": crypto/ecdh: use of X25519 is not allowed in FIPS 140-only mode" does not contain "use of keys smaller than 2048 bits is not allowed in FIPS 140-only mode"
Test: TestClientWithCertificate/fips_invalid_key_fips140only
=== FAIL: internal/pkg/remote TestClientWithCertificate/fips_valid_key_fips140only (0.00s)
client_fips_test.go:181:
Error Trace: /Users/cmackenzie/go/src/github.com/elastic/elastic-agent/internal/pkg/remote/client_fips_test.go:181
Error: Expected value not to be nil.
Test: TestClientWithCertificate/fips_valid_key_fips140only
=== FAIL: internal/pkg/remote TestClientWithCertificate (0.00s)
We are using this in a bunch of places, seeing it fail in fips specific tests is concerning. This particular test has a pre-generated certificate that appears to no longer be FIPS compliant.
It also comes up in the FakeInputSuite a lot for the gRPC connection which is more concerning.
blakerouse
left a comment
There was a problem hiding this comment.
This overall looks good. I am interested in the answers to @cmacknz questions as well. Holding off on approval until I read those.
This reverts commit 88ece11
⏳ Build in-progress, with failures
Failed CI Steps
History
cc @ycombinator |






This PR bumps up the Golang version to
1.25.8. It also:ms_tls13kdfGolang build tag when building in FIPS mode because this tag was only needed before Golang versions1.24.x.GODEBUG=tlsmlkem=0environment variable when running FIPS140-only unit tests. This prevents errors like so:Failed to connect: crypto/ecdh: use of X25519 is not allowed in FIPS 140-only mode.