[Tooling] Flaky tests - Issue #192#198
Conversation
…laky-tests Signed-off-by: Alessandro De Blasis <[email protected]>
Olshansk
left a comment
There was a problem hiding this comment.
General: I did a first pass at the review and think this is super interesting. It has the foundation for important infrastructure we'll really need in the future when we do more extensive testing. 🧑🔬 🤔 🧪
Will do a second pass after we tend to the first set of comments.
Random Ideas: Thoughts on creating PocketClock module. Not creating an issue for it (yet) since it's just an idea, but wanted to see if you think it'd be overkill.
Primary concern: I did some googling and it seems that Setters and Getters in go are idiomatic, but I'm concerned that:
- We need a way to enforce/encourage this because of the mutex requirements (you can imagine the issues as we have more contributors)
- Grep & update all existing sets/gets
This should really not happen because this part is set up for sure and usually works, but then sometimes docker hub does not authenticate and throws an error. I'll check why that might happen. Thank you for calling this out. |
This reverts commit bb8613c.
Olshansk
left a comment
There was a problem hiding this comment.
I think we're almost on the home stretch!!
Also, for future reference, you should be able to create branches directly from the main repo; deblasis:issue/192-tooling-flaky-tests.
…laky-tests Signed-off-by: Alessandro De Blasis <[email protected]>
## Description Consolidate & update techdebt in the consensus module (excluding the pacemaker module which is being modified by #198) in preparation for an end-to-end state hash flow explored in #237. ## Issue Fixes #249 ## Type of change - [ ] New feature, functionality or library - [ ] Bug fix (non-breaking change which fixes an issue) - [X] Code health or cleanup - [ ] Major breaking change - [X] Documentation - [ ] Other: NA ## List of changes **Consensus logic** - Pass in a list of messages to `findHighQC` instead of a hotstuff step - Made `CreateProposeMessage` and `CreateVotemessage` accept explicit values, identifying some bugs along the way - Made sure to call `applyBlock` when using `highQC` from previous round - Moved business logic for `prepareAndApplyBlock` into `hotstuff_leader.go` - Removed `MaxBlockBytes` and storing the consensus genesis type locally as is **Consensus cleanup** - Using appropriate getters for protocol types in the hotstuff lifecycle - Replaced `proto.Marshal` with `codec.GetCodec().Marshal` - Reorganized and cleaned up the code in `consensus/block.go` - Consolidated & removed a few `TODO`s throughout the consensus module - Added TECHDEBT and TODOs that will be require for a real block lifecycle - Fixed typo in `hotstuff_types.proto` - Moved the hotstuff handler interface to `consensus/hotstuff_handler.go` **Consensus testing** - Improved mock module initialization in `consensus/consensus_tests/utils_test.go` **General** - Added a diagram for `AppHash` related `ContextInitialization` - Added `Makefile` keywords for `TODO` ## Testing - [x] `make test_all` - [x] [LocalNet](https://github.com/pokt-network/pocket/blob/main/docs/development/README.md) w/ all of the steps outlined in the `README` ## Checklist - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have tested my changes using the available tooling - [ ] If applicable, I have made corresponding changes to related local or global README - [x] If applicable, I have added new diagrams using [mermaid.js](https://mermaid-js.github.io) - [ ] If applicable, I have added tests that prove my fix is effective or that my feature works
## Description The original motivation for this PR was to fix the `block is nil` on LocalNet. While testing (using unit tests), debugging and investigating (using LocalNet), several issues were uncovered in the consensus testing framework introduced by #198. The goal of this PR changed to fix those issues and improve the quality of the consensus codebase. ## Issue Tends to "quality of life" improvements, which are somewhat related to #361. ## Type of change Please mark the relevant option(s): - [ ] New feature, functionality or library - [x] Bug fix - [ ] Code health or cleanup - [ ] Major breaking change - [x] Documentation - [ ] Other <!-- add details here if it a different type of change --> ## List of changes ### Consensus - Core - Force consensus to use a "star-like" broadcast instead of "RainTree" broadcast - Improve logging throughout through the use of emojis and rewording certain statements - Slightly improve the block verification flow (renaming, minor fixes, etc…) to stabilize LocalNet ### Consensus - Tests - Rename the `consensus_tests` package to `e2e_tests` - Internalize configuration related to `fail_on_extra_msgs` from the `Makefile` to the `consensus` module - Forced all tests to fail if we receive extra unexpected messages and modify tests appropriately - After #198, we made tests deterministic but there was a hidden bug that modified how the test utility functions because the clock would not move while we were waiting for messages. This prevented logs from streaming, tests from failing, and other issues. Tend to all related changes. ### Consensus - Pacemaker - Rename `ValidateMessage` to `ShouldHandleMessage` and return a boolean - Pass a `reason` to `InterruptRoudn` - Improve readability of some parts of the code ### P2P - Add a lock to the mempool to avoid parallel messages which has caused the node to crash in the past ### Configs - Reorder private keys so addresses (retrieved by transforming private keys) to reflect the numbering in LocalNet appropriately. The address for val1, based on config1, will have the lexicographically first address. This makes debugging easier. ## Testing - [x] `make test_consensus` - [x] `make develop_test` - [x] [LocalNet](https://github.com/pokt-network/pocket/blob/main/docs/development/README.md) w/ all of the steps outlined in the `README` ## Required Checklist - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have tested my changes using the available tooling - [ ] I have updated the corresponding CHANGELOG ### If Applicable Checklist - [ ] I have updated the corresponding README(s); local and/or global - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added, or updated, [mermaid.js](https://mermaid-js.github.io) diagrams in the corresponding README(s) - [ ] I have added, or updated, documentation and [mermaid.js](https://mermaid-js.github.io) diagrams in `shared/docs/*` if I updated `shared/*`README(s) Co-authored-by: goku <[email protected]>


Description
This PR addresses test flakiness due to:
Tests are still flaky
TestHotstuff4Nodes1BlockHappyPath
TestTinyPacemakerTimeouts
Time mocking
Following the input in this comment I proceeded implementing time mocking.
The idea is that the nodes will have a reference that can proxy calls to either the real time package or a mocked version.
I am using github.com/benbjohnson/clock and I am starting to build utility functions around it that could become a wrapper at some point but it's early to say.
Something like this:

I believe this functionality is going to be quite useful in general especially since I believe we'll have to deal with timeouts and time-bound stuff quite a bit.
Race conditions
Mostly due to the fact that structs are accessed by multiple goroutines at the same time.
Addressed using
mutexesand wrapping the getters/setters into methods that also handle lock/unlock.Issue
Fixes issue/192
Type of change
Please mark the options that are relevant.
List of changes
ConsensusmoduleTesting
make test_allREADMEChecklist