Skip to content

[Epic] Improve sklearn compatibility #7061

@csadorf

Description

@csadorf

Improve cuML's sklearn compatibility through systematic testing and validation of estimators.

Impact

  • Better integration with sklearn ecosystem and meta-estimators
  • Reduced limitations for cuml.accel
  • Prevention of compatibility regressions
  • Improved documentation of compatibility status

Objectives

O1: Establish Testing Framework

Establish check_estimator testing for all cuML estimators, even with expected failures.

O2: Address High-Impact Issues

  • Identify and prioritize prevalent compatibility issues
  • Resolve high-impact issues

O3: Prevent Regressions

  • Make check_estimator tests opt-out rather than opt-in

Non-Goals

  • Perfect compatibility: Only address high-impact issues
  • Immediate resolution of all gaps: Defer low-impact issues
  • Testing all sklearn versions: Single version testing is sufficient
  • Exhaustive documentation: Issue-based documentation is sufficient for most cases

Timeline

  • 25.10: Establish testing infrastructure and address high-impact issues
  • Ongoing: Address high- and medium-impact issues

Risks and Mitigations

  • Scope creep: Prioritize issues rather than fixing everything
  • Maintenance burden: Keep test infrastructure lean
  • False positives: Evaluate impact before adjusting API/implementation

Success Criteria

  • All cuML estimators tested with check_estimator
  • High-impact compatibility issues identified and resolved

Metadata

Metadata

Assignees

Labels

sklearn-api-compatIssues around cuml matching sklearn API conventions/standards

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions