Skip to content

Conversation

@vvoland
Copy link
Collaborator

@vvoland vvoland commented Mar 14, 2025

  • Add runtime.Gosched() calls to encourage goroutine scheduling
  • Increase the timeout from 10ms to 500ms
  • Use poll.WaitOn with appropriate delays to ensure the goroutine has
    spawned before checking

- How to verify it

go test -race -run=TestConnectAndWait/connect_goroutine_exits_after_EOF -count=1000  -failfast ./cli-plugins/socket/...
ok  	github.com/docker/cli/cli-plugins/socket	1.985s

Signed-off-by: Paweł Gronowski [email protected]

@codecov-commenter
Copy link

codecov-commenter commented Mar 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 59.28%. Comparing base (2d74733) to head (0ce8989).
Report is 11 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5932      +/-   ##
==========================================
- Coverage   59.33%   59.28%   -0.05%     
==========================================
  Files         358      358              
  Lines       29783    29830      +47     
==========================================
+ Hits        17672    17685      +13     
- Misses      11142    11173      +31     
- Partials      969      972       +3     
🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@Benehiko Benehiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test might still flake now and again. It's impossible to know exactly how the Go runtime will start/cleanup the goroutine.

go test -race -run=TestConnectAndWait/connect_goroutine_exits_after_EOF -count=10000  -failfast -v ./cli-plugins/socket/...
...
=== RUN   TestConnectAndWait/connect_goroutine_exits_after_EOF
    socket_test.go:192: timeout hit after 500ms: waiting for connect goroutine to spawn
--- FAIL: TestConnectAndWait (0.50s)
    --- FAIL: TestConnectAndWait/connect_goroutine_exits_after_EOF (0.50s)
FAIL
FAIL    github.com/docker/cli/cli-plugins/socket        7.490s
FAIL

@vvoland vvoland force-pushed the TestConnectAndWait-flaky branch from 88fbbcc to 36a6b7e Compare March 17, 2025 13:39
@vvoland vvoland changed the title test/cli-plugins: Attempt to fix TestConnectAndWait flakiness test/cli-plugins: Attempt to make TestConnectAndWait less flaky Mar 17, 2025
@vvoland vvoland force-pushed the TestConnectAndWait-flaky branch from 36a6b7e to 413bb2b Compare March 17, 2025 13:40
@vvoland
Copy link
Collaborator Author

vvoland commented Mar 17, 2025

Hmm right, I think we can still make it a little bit stable by locking the test goroutine to its own thread.

$ go test  -run=TestConnectAndWait/connect_goroutine_exits_after_EOF -count=1000000  ./cli-plugins/socket/...                                                                                     
--- FAIL: TestConnectAndWait (0.50s)
    --- FAIL: TestConnectAndWait/connect_goroutine_exits_after_EOF (0.50s)
        socket_test.go:196: timeout hit after 500ms: waiting for connect goroutine to spawn
--- FAIL: TestConnectAndWait (0.51s)
    --- FAIL: TestConnectAndWait/connect_goroutine_exits_after_EOF (0.51s)
        socket_test.go:196: timeout hit after 500ms: waiting for connect goroutine to spawn
FAIL
FAIL	github.com/docker/cli/cli-plugins/socket	461.944s
FAIL

2 flakes per million runs still seems much better, while still keeping the test.

// relying on goroutine numbers to ensure correct behaviour
t.Run("connect goroutine exits after EOF", func(t *testing.T) {
runtime.LockOSThread()
defer runtime.LockOSThread()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this was meant to be

Suggested change
defer runtime.LockOSThread()
defer runtime.UnlockOSThread()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh!

- Add runtime.Gosched() calls to encourage goroutine scheduling
- Increase the timeout from 10ms to 500ms
- Use poll.WaitOn with appropriate delays to ensure the goroutine has
  spawned before checking
- Lock the test goroutines to its own thread

Signed-off-by: Paweł Gronowski <[email protected]>
@vvoland vvoland force-pushed the TestConnectAndWait-flaky branch from 413bb2b to 0ce8989 Compare March 19, 2025 10:52
@vvoland
Copy link
Collaborator Author

vvoland commented Mar 19, 2025

With UnlockOSThread fixed, I ran the test 2 times per 1M and no flake so far:

go test  -run=TestConnectAndWait/connect_goroutine_exits_after_EOF -count=1000000  ./cli-plugins/socket/...
ok  	github.com/docker/cli/cli-plugins/socket	418.590s

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

LGTM

@thaJeztah thaJeztah added this to the 28.0.2 milestone Mar 19, 2025
@vvoland vvoland modified the milestones: 28.0.2, 28.0.3 Mar 19, 2025
@vvoland vvoland merged commit 888716a into docker:master Mar 19, 2025
93 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test: TestConnectAndWait / TestConnectAndWait/connect_goroutine_exits_after_EOF

4 participants