Skip to content

Run the upstream HDBSCAN tests in CI#6995

Merged
rapids-bot[bot] merged 4 commits intorapidsai:branch-25.08from
jcrist:hdbscan-tests
Jul 14, 2025
Merged

Run the upstream HDBSCAN tests in CI#6995
rapids-bot[bot] merged 4 commits intorapidsai:branch-25.08from
jcrist:hdbscan-tests

Conversation

@jcrist
Copy link
Copy Markdown
Member

@jcrist jcrist commented Jul 9, 2025

This adds the HDBSCAN upstream test suite to CI.

A few complications:

  • The upstream hdbscan library has seen a lapse in maintenance lately. Thus, the test suite won't run cleanly (or even import) without sklearn < 1.6. Rather than pinning the sklearn version across the upstream test jobs (which would require a new entry in dependencies.yaml and changes to the test_python_common.sh script), for now we just skip the HDBSCAN tests if that condition isn't met. The oldest deps run will run the HDBSCAN tests, but not the newest deps run.
  • There are many failures in the HDBSCAN tests, most of which are memory errors. The xfail list here is stable (I've ran it 10s of times to be sure). That said, if you use pytest-randomly to shuffle the test order, you'll see a different set of failures. Not great. I figure it's still worth getting the test infra in place though before taking a look at the underlying memory issues.

Currently on top of #6994.

Fixes #6983.

jcrist added 2 commits July 9, 2025 10:00
Without a `pytest.ini`, the upstream tests will use the pytest config
for cuml itself (which turns Deprecation/FutureWarning into errors).
This can lead to test failures due to things outside our control.

This PR adds a small `pytest.ini` to the upstream tests, and changes the
pytest invocation so our upstream tests use it. It currently contains
the following warning config:

- `UnmatchedXfailTests`: error if there's an issue with the xfail list
- `FutureWarning` raised by cuml: error if our code warns of a
  deprecation.
- `PytestUnknownMarkWarning`: ignored, our xfail-list plugin creates a
  bunch of unknown marks, no need to warn about these.

Also fixes some annoying UX in the umap/run-tests.sh script where it
would `cd` into the directory before executing, meaning any forwarded
args (like `--junit-xml`) would have the incorrect relative path.
This adds the HDBSCAN upstream test suite to CI.

A few complications:

- The upstream `hdbscan` library has seen a lapse in maintenance lately.
  Thus, the test suite won't run cleanly (or even import) without
  sklearn < 1.6. Rather than pinning the sklearn version across the
  upstream test jobs (which would require a new entry in
  `dependencies.yaml` and changes to the `test_python_common.sh`
  script), for now we just skip the HDBSCAN tests if that condition
  isn't met. The oldest deps run will run the HDBSCAN tests, but not the
  newest deps run.
- There are many failures in the HDBSCAN tests, most of which are memory
  errors. The xfail list here is stable (I've ran it 10s of times to be
  sure). That said, if you use `pytest-randomly` to shuffle the test
  order, you'll see a different set of failures. Not great.
@jcrist jcrist self-assigned this Jul 9, 2025
@jcrist jcrist requested review from a team as code owners July 9, 2025 17:56
@jcrist jcrist requested a review from jameslamb July 9, 2025 17:56
@jcrist jcrist added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 9, 2025
@jcrist jcrist requested review from cjnolet and dantegd July 9, 2025 17:56
@jcrist jcrist added the cuml-accel Issues related to cuml.accel label Jul 9, 2025
@github-actions github-actions Bot added Cython / Python Cython or Python issue ci labels Jul 9, 2025
@jcrist jcrist requested review from csadorf and removed request for cjnolet July 9, 2025 18:08
@csadorf
Copy link
Copy Markdown
Contributor

csadorf commented Jul 9, 2025

Merge after #6845 and #6994.

@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Jul 9, 2025

Hmmm, looks like the memory failures (while consistent on my machine) are different in CI. Two tasks then:

@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Jul 10, 2025

Looked into this more this morning. There are 3 categories of failures:

  • Simple python-level failures (most of these should be quick fixes)
  • Tests that always fail, and break downstream tests (there's 5 of these, at least on my machine)
  • Tests that sometimes fail, and break downstream tests (this list seems impossible to nail down, it might be every test)

Given this, I think we won't be able to run these tests (at least in a required manner) until the memory bugs are fixed. A few options I see for now:

  • Don't merge this PR, only try to split out the always failing tests into new issues that can be resolved without the upstream hdbscan tests or cuml.accel. Try again once those are fixed.
  • Merge this PR, but remove the part adding this to CI. Gives devs the infra setup so they can run the test suite to help resolve memory issues. I'd still break out issues, but we'd have more stuff already merged in.
  • Merge this PR, but ignore failures. This could be done by very very liberally marking tests as flaky, or through ignoring the pytest failure in the ci bash script.

Clearly we have some bugs in our hdbscan implementation, these we should prioritize resolving. Likewise, the python-level failures are worth resolving since they're possibly easy fixes and might affect cuml.accel users running hdbscan. The actual hdbscan upstream test suite is of pretty low quality for us though, it's not clear to me that we'd gain a ton of value running it in CI beyond hdbscan smoketests (which we could as easily write ourselves in the cuml_accel_tests/integration/ directory). Given this, I'd lean towards option 1 or 2 above.

@csadorf
Copy link
Copy Markdown
Contributor

csadorf commented Jul 10, 2025

I very much prefer option 2. Let's get the infrastructure set up and start working on addressing those issues. I don't think running tests in CI that fail beyond xfails is advantageous as it just creates confusion and noise. We should add some smoke tests to cuml_accel_tests/integration/ either way.

We can't run these consistently, holding off adding them for now.
@github-actions github-actions Bot removed the ci label Jul 14, 2025
@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Jul 14, 2025

/merge

@rapids-bot rapids-bot Bot merged commit f594e08 into rapidsai:branch-25.08 Jul 14, 2025
73 checks passed
@jcrist jcrist deleted the hdbscan-tests branch July 14, 2025 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuml-accel Issues related to cuml.accel Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Run hdbscan tests with cuml.accel enabled in CI

3 participants