Skip to content

Bump Go version to 1.25.8#10156

Open
ycombinator wants to merge 41 commits intoelastic:mainfrom
ycombinator:bump-golang-1.25.1
Open

Bump Go version to 1.25.8#10156
ycombinator wants to merge 41 commits intoelastic:mainfrom
ycombinator:bump-golang-1.25.1

Conversation

@ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Sep 25, 2025

This PR bumps up the Golang version to 1.25.8. It also:

  • removes the ms_tls13kdf Golang build tag when building in FIPS mode because this tag was only needed before Golang versions 1.24.x.
  • sets ths GODEBUG=tlsmlkem=0 environment variable when running FIPS140-only unit tests. This prevents errors like so: Failed to connect: crypto/ecdh: use of X25519 is not allowed in FIPS 140-only mode.

@ycombinator ycombinator requested review from a team as code owners September 25, 2025 17:46
@ycombinator ycombinator added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team backport-active-all Automated backport with mergify to all the active branches labels Sep 25, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@ycombinator
Copy link
Contributor Author

ycombinator commented Sep 25, 2025

The fips140=only unit tests are failing like so:

crypto/ecdh: use of X25519 is not allowed in FIPS 140-only mode

These appear to be golang/go#75148, which should be fixable when golang/go#74630 is implemented. However, in order to upgrade to Go 1.25.1 now, we'll need to find a workaround.

@ycombinator
Copy link
Contributor Author

ycombinator commented Sep 25, 2025

The fips140=only unit tests are failing like so:

crypto/ecdh: use of X25519 is not allowed in FIPS 140-only mode

These appear to be golang/go#75148, which should be fixable when golang/go#74630 is implemented. However, in order to upgrade to Go 1.25.1 now, we'll need to find a workaround.

These errors are coming from Go downloading dependencies before executing the tests. The errors can be simulated like so:

GODEBUG=fips140=only go mod download -x
# get https://proxy.golang.org/github.com/opencontainers/image-spec/@v/v1.1.1.info
# get https://proxy.golang.org/github.com/opencontainers/image-spec/@v/v1.1.1.info: Get "https://proxy.golang.org/github.com/opencontainers/image-spec/@v/v1.1.1.info": crypto/ecdh: use of X25519 is not allowed in FIPS 140-only mode
...

So we probably just need to download the dependencies explicitly, ensuring that GODEBUG=fips140=only is not set for this step.

@ycombinator
Copy link
Contributor Author

So we probably just need to download the dependencies explicitly, ensuring that GODEBUG=fips140=only is not set for this step.

I've implemented this approach in this PR and it has helped. However, now CI is failing with this odd error which seems unrelated to FIPS in any way.

https://buildkite.com/elastic/elastic-agent/builds/27575#0199842f-d9dd-4481-9d46-37baa5c789b1/155-822

=== FAIL: dev-tools/mage TestGoTest_CaptureOutput/capture_panic (1.61s)
--
  | >> go test: asserts Testing
  | >> ARGS: asserts Command: gotestsum --no-color --junitfile-hide-skipped-tests -f standard-quiet -- -test.run TestGoTest_Helper_WithPanic .
  | 2025/09/26 04:35:17 exec: gotestsum --no-color --junitfile-hide-skipped-tests -f standard-quiet -- -test.run TestGoTest_Helper_WithPanic .
  | exec: gotestsum --no-color --junitfile-hide-skipped-tests -f standard-quiet -- -test.run TestGoTest_Helper_WithPanic .
  | gotest_test.go:120: GoTest output mismatch:
  | want:
  | (?sm:
  | === FAIL: dev-tools/mage TestGoTest_Helper_WithPanic.*
  | panic: Kaputt. \[recovered\].*
  | panic: Kaputt.*
  | )
  |  
  | got:
  | FAIL	github.com/elastic/elastic-agent/dev-tools/mage	0.021s
  |  
  | === Failed
  | === FAIL: dev-tools/mage TestGoTest_Helper_WithPanic (0.00s)
  | panic: Kaputt. [recovered, repanicked]
  |  
  | goroutine 21 [running]:
  | testing.tRunner.func1.2({0xcd39e0, 0xf923f0})
  | /opt/buildkite-agent/.asdf/installs/golang/1.25.1/go/src/testing/testing.go:1872 +0x237
  | testing.tRunner.func1()
  | /opt/buildkite-agent/.asdf/installs/golang/1.25.1/go/src/testing/testing.go:1875 +0x35b
  | panic({0xcd39e0?, 0xf923f0?})
  | /opt/buildkite-agent/.asdf/installs/golang/1.25.1/go/src/runtime/panic.go:783 +0x132
  | github.com/elastic/elastic-agent/dev-tools/mage.TestGoTest_Helper_WithPanic(0xc000103880?)
  | /opt/buildkite-agent/builds/bk-agent-prod-gcp-1758859361500881345/elastic/elastic-agent/dev-tools/mage/gotest_test.go:329 +0x30
  | testing.tRunner(0xc000103880, 0xea4750)
  | /opt/buildkite-agent/.asdf/installs/golang/1.25.1/go/src/testing/testing.go:1934 +0xea
  | created by testing.(*T).Run in goroutine 1
  | /opt/buildkite-agent/.asdf/installs/golang/1.25.1/go/src/testing/testing.go:1997 +0x465
  |  
  | DONE 1 tests, 1 failure in 0.022s
  |  
  | === FAIL: dev-tools/mage TestGoTest_CaptureOutput (26.02s)

@ycombinator
Copy link
Contributor Author

So we probably just need to download the dependencies explicitly, ensuring that GODEBUG=fips140=only is not set for this step.

I've implemented this approach in this PR and it has helped. However, now CI is failing with this odd error which seems unrelated to FIPS in any way.

https://buildkite.com/elastic/elastic-agent/builds/27575#0199842f-d9dd-4481-9d46-37baa5c789b1/155-822

=== FAIL: dev-tools/mage TestGoTest_CaptureOutput/capture_panic (1.61s)
--
  | >> go test: asserts Testing
  | >> ARGS: asserts Command: gotestsum --no-color --junitfile-hide-skipped-tests -f standard-quiet -- -test.run TestGoTest_Helper_WithPanic .
  | 2025/09/26 04:35:17 exec: gotestsum --no-color --junitfile-hide-skipped-tests -f standard-quiet -- -test.run TestGoTest_Helper_WithPanic .
  | exec: gotestsum --no-color --junitfile-hide-skipped-tests -f standard-quiet -- -test.run TestGoTest_Helper_WithPanic .
  | gotest_test.go:120: GoTest output mismatch:
  | want:
  | (?sm:
  | === FAIL: dev-tools/mage TestGoTest_Helper_WithPanic.*
  | panic: Kaputt. \[recovered\].*
  | panic: Kaputt.*
  | )
  |  
  | got:
  | FAIL	github.com/elastic/elastic-agent/dev-tools/mage	0.021s
  |  
  | === Failed
  | === FAIL: dev-tools/mage TestGoTest_Helper_WithPanic (0.00s)
  | panic: Kaputt. [recovered, repanicked]
  |  
  | goroutine 21 [running]:
  | testing.tRunner.func1.2({0xcd39e0, 0xf923f0})
  | /opt/buildkite-agent/.asdf/installs/golang/1.25.1/go/src/testing/testing.go:1872 +0x237
  | testing.tRunner.func1()
  | /opt/buildkite-agent/.asdf/installs/golang/1.25.1/go/src/testing/testing.go:1875 +0x35b
  | panic({0xcd39e0?, 0xf923f0?})
  | /opt/buildkite-agent/.asdf/installs/golang/1.25.1/go/src/runtime/panic.go:783 +0x132
  | github.com/elastic/elastic-agent/dev-tools/mage.TestGoTest_Helper_WithPanic(0xc000103880?)
  | /opt/buildkite-agent/builds/bk-agent-prod-gcp-1758859361500881345/elastic/elastic-agent/dev-tools/mage/gotest_test.go:329 +0x30
  | testing.tRunner(0xc000103880, 0xea4750)
  | /opt/buildkite-agent/.asdf/installs/golang/1.25.1/go/src/testing/testing.go:1934 +0xea
  | created by testing.(*T).Run in goroutine 1
  | /opt/buildkite-agent/.asdf/installs/golang/1.25.1/go/src/testing/testing.go:1997 +0x465
  |  
  | DONE 1 tests, 1 failure in 0.022s
  |  
  | === FAIL: dev-tools/mage TestGoTest_CaptureOutput (26.02s)

Turns out this is a change in behavior in Go 1.25: https://tip.golang.org/doc/go1.25#change-to-unhandled-panic-output. Addressed in 46cc036.

@pchila
Copy link
Member

pchila commented Sep 29, 2025

Looking at the latest build I see a couple of strange things (maybe some of those were already there and didn't notice until now)

@ycombinator
Copy link
Contributor Author

Windows build steps are failing in CI on this PR. See a lot of Access is denied errors. 🤔

@elastic-sonarqube
Copy link

@ycombinator ycombinator force-pushed the bump-golang-1.25.1 branch 2 times, most recently from 52d2f67 to b6b1a81 Compare October 1, 2025 14:36
@mergify
Copy link
Contributor

mergify bot commented Oct 3, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b bump-golang-1.25.1 upstream/bump-golang-1.25.1
git merge upstream/main
git push upstream bump-golang-1.25.1

@ycombinator ycombinator changed the title Bump Go version to 1.25.1 Bump Go version to 1.25.2 Oct 13, 2025
@ycombinator ycombinator force-pushed the bump-golang-1.25.1 branch 2 times, most recently from a41a670 to eaf3ce7 Compare October 15, 2025 00:21
@mergify
Copy link
Contributor

mergify bot commented Oct 16, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b bump-golang-1.25.1 upstream/bump-golang-1.25.1
git merge upstream/main
git push upstream bump-golang-1.25.1

@ycombinator
Copy link
Contributor Author

Windows builds are failing like so:

=== FAIL: internal/pkg/agent/application/upgrade TestWaitForWatcher/Timeout2:_state_doesn't_get_there_in_time (0.01s)
--
testing.go:1369: TempDir RemoveAll cleanup: unlinkat C:\Users\BUILDK~1\AppData\Local\Temp\TestWaitForWatcherTimeout2_state_doesnt_get_there_in_time3776368021: The directory is not empty.

@ebeahan
Copy link
Member

ebeahan commented Mar 9, 2026

Looking at most of the remaining CI failures, it's consistently Win 2025 and uninstall of Agent:

fixture_install.go:758: >> running binary with: [C:\Program Files\Elastic\Agent\elastic-agent.exe uninstall --force]
--
fixture_install.go:363:
Error Trace:	C:/buildkite-agent/builds/bk-agent-prod-gcp-1773077791216426239/elastic/elastic-agent/pkg/testing/fixture_install.go:363
C:/Users/Buildkite/.go/go-1.25.8/src/testing/testing.go:1308
C:/Users/Buildkite/.go/go-1.25.8/src/testing/testing.go:1572
C:/Users/Buildkite/.go/go-1.25.8/src/testing/testing.go:1928
C:/Users/Buildkite/.go/go-1.25.8/src/runtime/asm_amd64.s:1693
Error:      	Received unexpected error:
error running uninstall command: exit status 1
Test:       	TestLongRunningAgentForLeaks
Messages:   	uninstalling agent failed. Output: "\r[    ] Stopping service  [0s] \r                              \r\r[=== ] Successfully stopped service  [5s] \r                                          \r\r[=== ] Stopping upgrade watcher; none found  [5s] \r                                                  \r\r[==  ] Removing service  [5s] \r                                                  \r\r[==  ] Successfully uninstalled service  [5s] \r                                                  \r\r[==  ] Removing install directory  [5s] \r                                                  \r\r[   =] Failed to remove install directory  [1m5s] \r                                                  \r\r[   =] Failed to uninstall agent  [1m5s] Error uninstalling. Printing logs\n2026-03-09T18:15:31.439Z\tDEBUG\t[uninstall.state_migration]\tnot attempting to migrate from action store: state store already exists\n2026-03-09T18:15:36.832Z\tDEBUG\t[uninstall.state_migration]\tnot attempting to migrate from action store: state store already exists\n2026-03-09T18:15:36.849Z\tDEBUG\t[uninstall.composable]\tStarting controller for composable inputs\n2026-03-09T18:15:36.849Z\tDEBUG\t[uninstall.composable]\tStarted controller for composable inputs\n2026-03-09T18:15:36.849Z\tDEBUG\t[uninstall.composable]\tComputing new variable state for composable inputs\n2026-03-09T18:15:36.850Z\tDEBUG\t[uninstall.composable]\tStopping controller for composable inputs\n2026-03-09T18:15:36.850Z\tDEBUG\t[uninstall.composable]\tStopped controller for composable inputs\nError: error uninstalling agent: failed to remove installation directory (C:\\Program Files\\Elastic\\Agent): timed out while removing \"C:\\\\Program Files\\\\Elastic\\\\Agent\". Last error: unlinkat C:\\Program Files\\Elastic\\Agent\\data\\elastic-agent-9.4.0-SNAPSHOT-59c6e5\\elastic-agent.exe: Access is denied.\nFor help, please see our troubleshooting guide at https://www.elastic.co/docs/troubleshoot/ingest/fleet/common-problems\n"

Unclear at this time if this is still related to golang/go#77402.

@ebeahan
Copy link
Member

ebeahan commented Mar 9, 2026

Screenshot 2026-03-09 at 16 42 14

Was able to reproduce on a Windows 2025 server instance.

@swiatekm
Copy link
Contributor

We're now getting

2026-03-10T17:56:54.272Z	WARN	[uninstall]	RemovePath attempt 1 failed after 119ms: unlinkat C:\Program Files\Elastic\Agent-Development\data\elastic-agent-9.4.0-SNAPSHOT-03cd7d\elastic-agent.exe: Access is denied.
        	            	2026-03-10T17:56:54.309Z	WARN	[uninstall]	file is blocked: C:\Program Files\Elastic\Agent-Development\data\elastic-agent-9.4.0-SNAPSHOT-03cd7d\elastic-agent.exe; locking processes: Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host. (PID 4960)
        	            	2026-03-10T17:56:54.310Z	WARN	[uninstall]	removeBlockingExe failed: failed to dispose handle for "C:\\Program Files\\Elastic\\Agent-Development\\data\\elastic-agent-9.4.0-SNAPSHOT-03cd7d\\elastic-agent.exe": Access is denied.

https://buildkite.com/elastic/elastic-agent/builds/35651/steps/canvas?jid=019cd8be-000e-42e3-87d6-78274e9ad672&tab=output.

@swiatekm swiatekm force-pushed the bump-golang-1.25.1 branch from 2da3431 to 83ca6d4 Compare March 11, 2026 16:12
@swiatekm
Copy link
Contributor

buildkite test this

@swiatekm
Copy link
Contributor

I fixed the remaining issues stemming from the upgrade, albeit one of them is a revert to 1.24 behavior as we investigate alternatives:

  • bf6e8b9 uses the 1.24 implementation of os.RemoveAll. The 1.25 implementation interacts poorly with the way we delete the agent binary during uninstall. This is a hotfix to unblock the upgrade, we should find a more robust way to do it.

  • 83ca6d4 ensures we stop the fsnotify watcher for the upgrade marker file before exiting. This can make test cleanup fail on Windows.

@swiatekm
Copy link
Contributor

@cmacknz @michel-laterman can you review the above changes? I'd like to get more eyes on this before we merge.

@swiatekm swiatekm requested a review from cmacknz March 11, 2026 16:53
Copy link
Contributor

@michel-laterman michel-laterman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for figuring this out!

// Windows use NtCreateFile with DELETE access — these are stricter about file
// state and fail on files that have been ADS-renamed. The simple path-based
// approach using os.Remove works correctly with the ADS rename trick that
// RemovePath uses to delete running executables on Windows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, what's the plan here, are we going to address the changes in file permissions or do something else? Do we have a follow-up issue?

Copy link
Contributor

@swiatekm swiatekm Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have an issue, and I'm not sure what the actual root cause is. Just that the new RemoveAll implementation doesn't like the data stream renames and naive attempts to fix it don't really help. The errors aren't very informative either, you just get Access Denied.

I will file an issue describing the problem and what I tried to fix it. For now, I want to unblock this upgrade because it blocks a bunch of other updates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes at this point, let's fall back to the implementation we know works and work out how to get rid of this separately. This is blocking the collector and contrib dependency updates for example, in addition to use not getting CVE fixes in Go anymore.

Co-authored-by: Michel Laterman <82832767+michel-laterman@users.noreply.github.com>
// We also set GODEBUG=tlsmlkem=0 to disable the X25519MLKEM768 TLS key
// exchange mechanism; without this setting and with the GODEBUG=fips140=only
// setting, we get errors in tests like so:
// Failed to connect: crypto/ecdh: use of X25519 is not allowed in FIPS 140-only mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we using this? The FIPS mode is supposed to default you to only using FIPS compatible parts of TLS without us having to do anything. Are we explicitly testing a non-FIPS mechanism somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That, I don't really know, Shaunak is the FIPS expert and he handled that part. I say we merge as-is and he can revisit this after he's back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=== FAIL: internal/pkg/remote TestClientWithCertificate/fips_invalid_key_fips140only (0.00s)
    client_fips_test.go:186:
        	Error Trace:	/Users/cmackenzie/go/src/github.com/elastic/elastic-agent/internal/pkg/remote/client_fips_test.go:186
        	Error:      	"all hosts failed: requester 0/1 to host https://127.0.0.1:64269/ errored: Get \"https://127.0.0.1:64269/echo-hello?\": crypto/ecdh: use of X25519 is not allowed in FIPS 140-only mode" does not contain "use of keys smaller than 2048 bits is not allowed in FIPS 140-only mode"
        	Test:       	TestClientWithCertificate/fips_invalid_key_fips140only

=== FAIL: internal/pkg/remote TestClientWithCertificate/fips_valid_key_fips140only (0.00s)
    client_fips_test.go:181:
        	Error Trace:	/Users/cmackenzie/go/src/github.com/elastic/elastic-agent/internal/pkg/remote/client_fips_test.go:181
        	Error:      	Expected value not to be nil.
        	Test:       	TestClientWithCertificate/fips_valid_key_fips140only

=== FAIL: internal/pkg/remote TestClientWithCertificate (0.00s)

We are using this in a bunch of places, seeing it fail in fips specific tests is concerning. This particular test has a pre-generated certificate that appears to no longer be FIPS compliant.

It also comes up in the FakeInputSuite a lot for the gRPC connection which is more concerning.

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overall looks good. I am interested in the answers to @cmacknz questions as well. Holding off on approval until I read those.

@mergify mergify bot mentioned this pull request Mar 12, 2026
8 tasks
@elasticmachine
Copy link
Contributor

elasticmachine commented Mar 12, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-all Automated backport with mergify to all the active branches Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants