Context and Motivation
The cuML test infrastructure has evolved organically as the library has grown, leading to several challenges including test flakiness, coverage gaps, and inconsistent fixture usage. A systematic overhaul of the test infrastructure will help address these issues by improving reliability, clarifying test organization, optimizing execution time, and reducing maintenance burden - ultimately providing a better development experience and more robust codebase.
It is assumed that the plan outlined here will have to be implemented incrementally. The issue serves to tracking overarching aims and progress.
Overall Aims
- Reduce test flakiness
- Increase test coverage
- Improve test maintainability and organization
- Enhance test performance and execution time
General Approach
- Better delineate between functional tests that check the plumping and correctness tests that check whether our algorithmic implementation is correct.
- For correctness checks threshold should ideally be rooted in an analytical expectation based on the condition of the problem, but realistically, be more aggressive about reducing thresholds to levels that would clearly indicate a regression rather than barely below the usually observed value.
- Make sure that especially the correctness checks are as deterministic as possible while keeping in mind that in many cases it simply isn’t due to the nature of parallel computation on GPU devices.
- Use per-test retries where appropriate, i.e., for correctness tests where single test failure among many successes does not immediately indicate a regression.
- Use automated retries for all components that rely on external resources, e.g., dataset downloads.
Specific Approach
1. Test Organization and Structure
Test Categories
- Consider all tests as functional by default (API, input validation, basic behavior)
- Explicitly mark or identify:
- Correctness tests (algorithmic implementation verification)
- Integration tests (end-to-end workflows)
- Keep performance tests separate in dedicated benchmarking suite
Organization Alternatives
-
Test Type Identification Options
a. Via pytest markers:
@pytest.mark.correctness
@pytest.mark.integration
- Pros: Flexible filtering, no file reorganization needed
- Cons: Requires discipline in marker usage
b. Via filename conventions:
test_kmeans.py # functional tests (default)
test_kmeans_correct.py # correctness tests
test_kmeans_integ.py # integration tests
- Pros: Clear visual separation
- Cons: Could lead to code duplication
c. Via test name conventions:
def test_kmeans_fit() # functional test
def test_correct_kmeans_conv() # correctness test
def test_integ_kmeans_pipe() # integration test
- Pros: Easy to implement
- Cons: Less structured than other options
-
Directory-Based Structure
tests/
functional/ # default location
correctness/
integration/
- Pros: Clear separation of concerns
- Cons: May require significant reorganization
Recommendations for Review
- Evaluate current test organization pain points
- Consider implementing pytest markers as a non-invasive first step
- Review potential benefits of filename conventions
- Assess need for directory restructuring based on maintenance experience
2. Test Quality Improvements
-
For functional tests:
- Ensure comprehensive input validation (use hypothesis strategies where possible)
- Test edge cases and error conditions
- Verify API contract compliance
- Avoid thresholded checks or use very conservative thresholds
-
For correctness checks:
- Root thresholds in analytical expectations where possible
- Set aggressive thresholds that clearly indicate regressions
- Document the rationale behind threshold choices and expected variance
- Try to implement tests as deterministic as possible
- Use automatic retries where appropriate*
*) In case that we opt to use a "correctness" marker, retry logic could be implemented as part of the marker.
4. Test Infrastructure Enhancements
- More consistent use of common fixtures (deduplicate existing ones)
- More consistent use of hypothesis strategies for input validation
Related issues
Context and Motivation
The cuML test infrastructure has evolved organically as the library has grown, leading to several challenges including test flakiness, coverage gaps, and inconsistent fixture usage. A systematic overhaul of the test infrastructure will help address these issues by improving reliability, clarifying test organization, optimizing execution time, and reducing maintenance burden - ultimately providing a better development experience and more robust codebase.
It is assumed that the plan outlined here will have to be implemented incrementally. The issue serves to tracking overarching aims and progress.
Overall Aims
General Approach
Specific Approach
1. Test Organization and Structure
Test Categories
Organization Alternatives
Test Type Identification Options
a. Via pytest markers:
b. Via filename conventions:
c. Via test name conventions:
Directory-Based Structure
Recommendations for Review
2. Test Quality Improvements
For functional tests:
For correctness checks:
*) In case that we opt to use a "correctness" marker, retry logic could be implemented as part of the marker.
4. Test Infrastructure Enhancements
Related issues
pytest-randomlyto run tests in a random order #6375