Skip to content

Fix UMAP inverse_transform OOH error#7863

Merged
rapids-bot[bot] merged 5 commits intorapidsai:mainfrom
viclafargue:fix-umap-inverse-transform-ooh-error
Mar 11, 2026
Merged

Fix UMAP inverse_transform OOH error#7863
rapids-bot[bot] merged 5 commits intorapidsai:mainfrom
viclafargue:fix-umap-inverse-transform-ooh-error

Conversation

@viclafargue
Copy link
Copy Markdown
Contributor

Closes #7861.

scipy.spatial.Delaunay.find_simplex has a tol parameter that could in theory help avert this issue. However, setting a different tol value would not necessarily guarantee that every point is assigned a simplex and more importantly this could lower the quality of simplex assignment.

Instead of raising an out of hull exception, the "out of hull" points are instead KNN searched (L2 norm + argmin). At the moment, everything is handled with Scipy and Numpy. Once the CuPy version that cuML is shipped with will offer a GPU accelerated Delaunay, we will switch this to a CuPy processing.

@viclafargue viclafargue requested a review from a team as a code owner March 6, 2026 16:26
@viclafargue viclafargue requested a review from jcrist March 6, 2026 16:26
@github-actions github-actions Bot added the Cython / Python Cython or Python issue label Mar 6, 2026
@viclafargue viclafargue added bug Something isn't working non-breaking Non-breaking change labels Mar 6, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a3b53b30-439c-4aa2-b53d-1b8f1cdd9dd7

📥 Commits

Reviewing files that changed from the base of the PR and between c28c2a0 and 8c68606.

📒 Files selected for processing (1)
  • python/cuml/cuml/manifold/umap/umap.pyx

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes
    • Improved robustness of manifold/UMAP computations by handling points outside the expected convex hull: such points are now assigned a nearest-embedding fallback instead of causing errors, yielding stable embeddings.
    • No public interfaces or signatures were changed.

Walkthrough

_compute_inverse_neighborhoods now computes a single start_vertices array for all input points: in-hull points use the first simplex vertex as before; points outside the convex hull are assigned the nearest embedding vertex by Euclidean distance instead of raising an error.

Changes

Cohort / File(s) Summary
UMAP Inverse Neighborhood Computation
python/cuml/cuml/manifold/umap/umap.pyx
Compute start_vertices once for all points. Keep first simplex vertex for in-hull points. For out-of-hull points, remove the ValueError path and assign the nearest embedding vertex via Euclidean distance. Minor related adjustments to out-of-hull handling.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: fixing UMAP's inverse_transform to handle out-of-hull (OOH) errors, which directly matches the changeset's purpose.
Description check ✅ Passed The description is related to the changeset, explaining the issue and the solution of using KNN search instead of raising an exception for out-of-hull points.
Linked Issues check ✅ Passed The code changes address the objective from #7861 by replacing the out-of-hull exception with a KNN-based fallback that handles points outside the convex hull gracefully.
Out of Scope Changes check ✅ Passed The changes are scoped to the specific issue: modifying _compute_inverse_neighborhoods to handle out-of-hull points via KNN search, which directly addresses #7861.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
python/cuml/cuml/manifold/umap/umap.pyx (1)

171-180: Add a deterministic regression test for the out-of-hull branch.

This fix only matters when the fallback path is taken, but the current test_inverse_transform coverage appears to rely on environment-specific hull classification of training points. Please add a test that explicitly inverse-transforms a point outside the embedding hull and asserts that it returns finite output without raising, so this doesn’t regress only on specific SciPy/Qhull/platform combinations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/manifold/umap/umap.pyx` around lines 171 - 180, Add a
deterministic regression test that exercises the out-of-hull fallback in
inverse-transform: create a small, fixed embedding array and an explicit query
point guaranteed to lie outside the convex hull (e.g., far outside the min/max
of embedding coordinates or a point beyond convex combination bounds), call the
UMAP inverse_transform method (or the test helper used in
test_inverse_transform) with a fixed RNG/seed, and assert that the call does not
raise and returns finite numeric values (use np.isfinite on the result). Ensure
the test triggers the out_of_hull_mask path by verifying that at least one input
point is recognized as out-of-hull and that start_vertices is assigned via
argmin on distances so the fallback code path (out_of_hull_mask, ooh_points,
dists, start_vertices assignment) is covered deterministically.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml/manifold/umap/umap.pyx`:
- Around line 175-180: The current fallback in inverse_transform that computes
dists = np.linalg.norm(embedding_np[np.newaxis,...] -
ooh_points[:,np.newaxis,...]) materializes a (num_ooh_points, n_embedding)
matrix and can OOM; replace this with a memory-bounded nearest-neighbor lookup
to fill start_vertices[out_of_hull_mask] by querying the embedding for the
nearest neighbor of each ooh_point (e.g., build a KD-tree on embedding_np via
scipy.spatial.cKDTree and call tree.query(ooh_points) which accepts a 2D array
of queries and returns one index per row, or implement a chunked argmin loop
that processes ooh_points in batches and computes np.argmin over the small
(batch_size, n_embedding) distances), keeping references to ooh_points,
embedding_np, start_vertices, and the inverse_transform/find_simplex fallback
logic unchanged.

---

Nitpick comments:
In `@python/cuml/cuml/manifold/umap/umap.pyx`:
- Around line 171-180: Add a deterministic regression test that exercises the
out-of-hull fallback in inverse-transform: create a small, fixed embedding array
and an explicit query point guaranteed to lie outside the convex hull (e.g., far
outside the min/max of embedding coordinates or a point beyond convex
combination bounds), call the UMAP inverse_transform method (or the test helper
used in test_inverse_transform) with a fixed RNG/seed, and assert that the call
does not raise and returns finite numeric values (use np.isfinite on the
result). Ensure the test triggers the out_of_hull_mask path by verifying that at
least one input point is recognized as out-of-hull and that start_vertices is
assigned via argmin on distances so the fallback code path (out_of_hull_mask,
ooh_points, dists, start_vertices assignment) is covered deterministically.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4c3400fb-3947-4a52-9c6c-9eeee1216be4

📥 Commits

Reviewing files that changed from the base of the PR and between d28fd9d and bdbe689.

📒 Files selected for processing (1)
  • python/cuml/cuml/manifold/umap/umap.pyx

Comment thread python/cuml/cuml/manifold/umap/umap.pyx
Copy link
Copy Markdown
Contributor

@jinsolp jinsolp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think it's a good fallback 👍 (as long as we are confident that this is a precision related issue)

@csadorf
Copy link
Copy Markdown
Contributor

csadorf commented Mar 11, 2026

/merge

@rapids-bot rapids-bot Bot merged commit 8c06f91 into rapidsai:main Mar 11, 2026
93 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Cython / Python Cython or Python issue non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] test_umap.py::test_inverse_transform fails with ValueError: point outside convex hull

4 participants