Fix UMAP inverse_transform OOH error#7863
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughSummary by CodeRabbit
Walkthrough_compute_inverse_neighborhoods now computes a single Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
python/cuml/cuml/manifold/umap/umap.pyx (1)
171-180: Add a deterministic regression test for the out-of-hull branch.This fix only matters when the fallback path is taken, but the current
test_inverse_transformcoverage appears to rely on environment-specific hull classification of training points. Please add a test that explicitly inverse-transforms a point outside the embedding hull and asserts that it returns finite output without raising, so this doesn’t regress only on specific SciPy/Qhull/platform combinations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/cuml/cuml/manifold/umap/umap.pyx` around lines 171 - 180, Add a deterministic regression test that exercises the out-of-hull fallback in inverse-transform: create a small, fixed embedding array and an explicit query point guaranteed to lie outside the convex hull (e.g., far outside the min/max of embedding coordinates or a point beyond convex combination bounds), call the UMAP inverse_transform method (or the test helper used in test_inverse_transform) with a fixed RNG/seed, and assert that the call does not raise and returns finite numeric values (use np.isfinite on the result). Ensure the test triggers the out_of_hull_mask path by verifying that at least one input point is recognized as out-of-hull and that start_vertices is assigned via argmin on distances so the fallback code path (out_of_hull_mask, ooh_points, dists, start_vertices assignment) is covered deterministically.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@python/cuml/cuml/manifold/umap/umap.pyx`:
- Around line 175-180: The current fallback in inverse_transform that computes
dists = np.linalg.norm(embedding_np[np.newaxis,...] -
ooh_points[:,np.newaxis,...]) materializes a (num_ooh_points, n_embedding)
matrix and can OOM; replace this with a memory-bounded nearest-neighbor lookup
to fill start_vertices[out_of_hull_mask] by querying the embedding for the
nearest neighbor of each ooh_point (e.g., build a KD-tree on embedding_np via
scipy.spatial.cKDTree and call tree.query(ooh_points) which accepts a 2D array
of queries and returns one index per row, or implement a chunked argmin loop
that processes ooh_points in batches and computes np.argmin over the small
(batch_size, n_embedding) distances), keeping references to ooh_points,
embedding_np, start_vertices, and the inverse_transform/find_simplex fallback
logic unchanged.
---
Nitpick comments:
In `@python/cuml/cuml/manifold/umap/umap.pyx`:
- Around line 171-180: Add a deterministic regression test that exercises the
out-of-hull fallback in inverse-transform: create a small, fixed embedding array
and an explicit query point guaranteed to lie outside the convex hull (e.g., far
outside the min/max of embedding coordinates or a point beyond convex
combination bounds), call the UMAP inverse_transform method (or the test helper
used in test_inverse_transform) with a fixed RNG/seed, and assert that the call
does not raise and returns finite numeric values (use np.isfinite on the
result). Ensure the test triggers the out_of_hull_mask path by verifying that at
least one input point is recognized as out-of-hull and that start_vertices is
assigned via argmin on distances so the fallback code path (out_of_hull_mask,
ooh_points, dists, start_vertices assignment) is covered deterministically.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4c3400fb-3947-4a52-9c6c-9eeee1216be4
📒 Files selected for processing (1)
python/cuml/cuml/manifold/umap/umap.pyx
|
/merge |
Closes #7861.
scipy.spatial.Delaunay.find_simplexhas atolparameter that could in theory help avert this issue. However, setting a differenttolvalue would not necessarily guarantee that every point is assigned a simplex and more importantly this could lower the quality of simplex assignment.Instead of raising an out of hull exception, the "out of hull" points are instead KNN searched (L2 norm + argmin). At the moment, everything is handled with Scipy and Numpy. Once the CuPy version that cuML is shipped with will offer a GPU accelerated Delaunay, we will switch this to a CuPy processing.