Skip to content

[Consensus] Improve Consensus Testing Framework Determinism #462

@Olshansk

Description

@Olshansk

Objective

Remove non-determinism from the consensus testing framework.

Origin Document

In #198, we introduced a major improvement to the consensus testing framework to remove flaky tests via a mocked clock. In #404, we iterated on that solution to make sure that tests continue to fail if the expected messages do not arrive within a certain amount of time.

Pros of the test framework at the time of writing:

  • Tests the consensus module E2E
  • Simulates an asynchronous distributed cluster of nodes driven by a mocked clock

Cons of the test framework at the time of writing:

  • Depends on the maxWaitTimeMillis parameter in waitForEventsInternal
  • Depends on a ticker in waitForEventsInternal that increments the mocked clock using real time
  • Users with slower machines may see a failing test due to slower messages times leading to the well-known WorksOnMyMachine situation

The video below captures the following:

  1. A successful test
  2. A failing test that timed out because we were waiting on messages that should not be sent (expected behaviour because the test was made incorrect)
  3. A hanging test that timed out because we were waiting on messages that should not be sent (expected behaviour but test should fail)
Screen.Recording.2023-01-25.at.11.27.14.AM.mov

Goals

  • Remove non-determinism from the consensus testing framework
  • Prevent the performance of the developer's machine from impacting the success/failure of a test
  • Prevent developers from wasting time on issues that are specific to gaps in the flakiness/determinism of the test framework

Deliverable

  • A PR that updates the utils in consensus/e2e_tests and optionally the tests (in needed) to achieve the goals above
  • Documentation that explains the consensus testing (components, example usage, etc...)

Note that the implementation is up to the creativity of the implementor. This may or may not involve removing the maxWaitTimeMillis and does required detailed knowledge & research into the framework.

Non-goals / Non-deliverables

  • Add/modify consensus module business logic
  • Increase test coverage

General issue deliverables

  • Update the appropriate CHANGELOG(s)
  • Update any relevant local/global README(s)

Testing Methodology

  • All tests: make test_all
  • LocalNet: verify a LocalNet is still functioning correctly by following the instructions at docs/development/README.md

Creator: @Olshansk
Co-Owners: @deblasis

Metadata

Metadata

Assignees

No one assigned

    Labels

    consensusConsensus specific changestestingDefining, adding, automating or modifying teststoolingtooling to support development, testing et al

    Type

    No type

    Projects

    Status

    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions