Improved UMAP testing and debugging#7073
Improved UMAP testing and debugging#7073rapids-bot[bot] merged 28 commits intorapidsai:branch-25.10from
Conversation
|
@viclafargue Please make sure to target branch-25.10. |
…-testing-and-debugging
|
I started taking a look at this and tried running the command from the top comment. You need to install a few extra libraries (poltly, cuvs, ripser) to make it work. Maybe worth adding a short README to the directory that explains what is needed and include the two example commands from the top comment. So that people from the future have an easy time figuring out how to use this |
|
I won't get around to thinking about this in time for merging it today. However I think this is a debugging tool for developers, so this is a low risk thing to merge in my opinion. |
csadorf
left a comment
There was a problem hiding this comment.
I think it's a good addition to improve our test quality, but I have a few comments, that I'd like to see addressed.
I am not sure that python/cuml/cuml/testing/manifold/ specifically and this repository in general is really the right place to maintain this framework unless we make this accessible as an extra entrypoint or add it to the cuML Python API. Its dependencies are also not captured in depdencies.yaml. I think maintaining this in a separate repository is likely more advisable for now.
| #### Running Tests | ||
|
|
||
| ```bash | ||
| DATASET_DIR=datasets pytest python/cuml/cuml/testing/manifold/test_umap.py -v |
There was a problem hiding this comment.
I ran this and almost all tests failed, because the datasets were not available. Is this expected? Am I supposed to download those separately? If so, how or where?
There was a problem hiding this comment.
Yes, the datasets should be downloaded separately. When missing the datasets the tests should avert the user that they have to download them. I just complemented this with a README update. The commands are now given both in the test and the README.
|
/merge |
Answers #7072.
Working on a single PR for now as some metrics are used both for debugging and testing.
This PR adds the following :
A debugging stack for UMAP :
Improved testing :
✅ KNN testing
✅ Fuzzy simplicial set testing
✅Spectral initialization testing
✅Embedding optimization testing
🚧
cuml-accelspecific testing (will give update on #6974)Usage :
$ python python/cuml/cuml/testing/manifold/run_umap_debug.py --dataset "Swiss Roll" --web-report