Skip to content

Conversation

@Benehiko
Copy link
Member

@Benehiko Benehiko commented Mar 12, 2025

Closes #5837

- What I did
The test currently measures the reduction in the number of goroutines, which is strange as there is a callback function called by ConnectAndWait once the io.EOF is reached.

- How I did it

- How to verify it

go test -race -run=TestConnectAndWait/connect_goroutine_exits_after_EOF -count=1000 -v -failfast ./cli-plugins/socket/...

- Human readable description for the release notes

- A picture of a cute animal (not mandatory but encouraged)

@codecov-commenter
Copy link

codecov-commenter commented Mar 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 59.32%. Comparing base (2d74733) to head (ce9a64c).
Report is 16 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5927      +/-   ##
==========================================
- Coverage   59.33%   59.32%   -0.02%     
==========================================
  Files         358      358              
  Lines       29783    29783              
==========================================
- Hits        17672    17669       -3     
- Misses      11142    11145       +3     
  Partials      969      969              
🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Benehiko Benehiko requested a review from a team March 12, 2025 14:24
@Benehiko Benehiko force-pushed the fix-socket-flaky-test branch from df722e9 to ce9a64c Compare March 12, 2025 14:24
@Benehiko Benehiko marked this pull request as ready for review March 12, 2025 14:43
@Benehiko Benehiko added area/testing kind/bugfix PR's that fix bugs labels Mar 13, 2025
@Benehiko Benehiko self-assigned this Mar 13, 2025
return poll.Continue("waiting for connect goroutine to exit")
}
return poll.Success()
}, poll.WithDelay(1*time.Millisecond), poll.WithTimeout(10*time.Millisecond))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we specifically wanted to assert that goroutine is stopped.

Wondering if just increasing the timeout would help with the flakiness?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i honestly don't know why we would want to do that. The goroutine will stop once the callback function returns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in this case I think we should also have a poll.WaitOn instead of a straight assert (because the goroutine may not be spawned immediately).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goroutine will stop once the callback function returns.

Unless there's a bug in ConnectAndWait 🙈

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the code is changed and the implementation is bad, then there is no guarantee that a test would catch the regression anyway. Focusing on a hypothetical case won't solve the fact that this test fails multiple times per day.

Either we re-design the ConnectAndWait to be more resilient or we change the test as I've done here. We cannot predict when a goroutine will run or when it will exit. The changes I make here are more robust from the test perspective.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it no longer tests what the test claims to: connect goroutine exits after EOF.

Maybe it's not possible to reliably test that, okay, but let's try to fix the actual test before actually changing it into completely different test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened: #5932

If it's still flaky after that, I'm 100% agreeing on changing this test according to your proposal :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it no longer tests what the test claims to: connect goroutine exits after EOF.

That's not true, it still tests that the goroutine exits on EOF.

The callback is only called once we receive an EOF here:

if errors.Is(err, io.EOF) {
cb()
return
}

After the callback is called the goroutine will exit due to the return.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it tests that the callback gets called after an EOF.

After the callback is called the goroutine will exit due to the return.

Yes, that's true with the current code, but tests are also there to detect an unwanted change in future. If this function gets changed in future and introduces a bug where the callback gets called, but there's no return out of the goroutine then this test would be able to detect it.

@Benehiko Benehiko closed this Mar 19, 2025
@Benehiko
Copy link
Member Author

In favor of #5932

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/testing kind/bugfix PR's that fix bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test: TestConnectAndWait / TestConnectAndWait/connect_goroutine_exits_after_EOF

4 participants