Documentation Updates for cuML Python Developer Guide by csadorf · Pull Request #6843 · rapidsai/cuml

csadorf · 2025-06-03T21:52:38Z

This PR addresses several documentation improvements outlined in the issue, focusing on the Python Developer Guide and Estimator Guide.

Core Documentation Updates

Completed logging section with usage guidance, best practices and examples
Completed "Device and Host Memory" section with RMM integration details
Completed "Multi GPU" section with usage patterns and limitations
Expanded API documentation guidelines with docstring formatting standards and examples
Improved section order and overall flow

Estimator Guide Revisions

Updated guide to clarify API matching policy:
- Match scikit-learn/umap-learn APIs where possible and reasonable
- API deviations must be well-justified and documented
- Unused parameters or arguments should generally not be matched
- Exact API matching is not required; consumers should use cuml.accel if that is needed
Content Organization:
- Deleted all sections referencing cuml-cpu functionality
- Added explicit requirement for keyword-only arguments in constructors

Memory Management Terminology

Revised documentation to use "accessibility" terminology instead of physical location references
Updated memory management sections to acknowledge that memory can be host or device accessible
Removed references to cuml-cpu related components

Closes #6850 .

- Added guidelines on using the rapids-logger library for logging, including usage examples and best practices. - Updated the documentation style guide for consistency with NumPy docstring standards. - Expanded the multi-GPU support section with Dask integration details and best practices for effective usage. - General formatting improvements.

copy-pr-bot · 2025-06-03T21:52:41Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

- Use "accessibility" terminology to acknowledge that memory can be host or device accessible rather than relying on the actual physical location. - Remove reference to cuml-cpu related components.

csadorf · 2025-06-04T21:27:51Z

+# These operations will run concurrently
+kmeans1.fit(X1)
+kmeans2.fit(X2)


I'm honestly not sure how true that is considering that the two Python functions are actually not running asynchronously. I've mostly corrected the previously defunct example and made sure that this code runs.

This isn't true, kmeans1.fit will fully run before kmeans2.fit starts. It would be true if they were running in separate threads/processes though, but in that case there's no real reason to create your own handles anyway.

I'm not sure about this specific case but when you talk about benchmarking you often get advice like https://docs.cupy.dev/en/stable/user_guide/performance.html#benchmarking or the very end of section 3 https://docs.pytorch.org/tutorials/recipes/recipes/benchmark.html#benchmarking-with-torch-utils-benchmark-timer

The cupy guide suggests to me a Python function (my_func in that example) can return before all the work is done. All that happens before it returns is that the kernel is launched. What exactly happens when the function returns the result of the kernel (but you don't look at it?) probably depends on the library (not sure if PyTorch and cupy do the same thing?).

But my take away from all this is that you have to be very precise with the example and what you are measuring. To avoid creating more confusion. So maybe we can link to such a guide that is very precise about when what happens? For example, in the k-means example here, what happens if I access kmeans1.cluster_centers_ without calling s1.sync()?

Indeed, the kernel functions could in fact run asynchronously until we block on the sync calls, but I just ran some tests and did some profiling and they are certainly not for this example. I made sure to additionally test with all sync calls removed (both within the user code within our code) and it made no difference.

This is certainly worth exploring, but not within the scope of this PR.

Given that the statement here isn't true (you can't concurrently run estimators like this), should we rip this section out then? Sorry, I should have been clearer in my review - I approved since it was strictly better than what we had before, but this section is inaccurate and should be amended IMO.

Yes, we probably should and happy to do so in an immediate follow-up. This PR constitutes a general improvement so no need to bike-shed on this.

csadorf · 2025-06-04T21:30:50Z

I think there is a lot redundancy and verbosity in this guide specifically w.r.t. to array and output type handling that should be improved, but I would prefer to punt on that and roll any necessary revisions into #6463 and #5022 .

jcrist

One comment, otherwise LGTM.

jcrist · 2025-06-04T22:01:12Z

+# These operations will run concurrently
+kmeans1.fit(X1)
+kmeans2.fit(X2)


This isn't true, kmeans1.fit will fully run before kmeans2.fit starts. It would be true if they were running in separate threads/processes though, but in that case there's no real reason to create your own handles anyway.

csadorf · 2025-06-05T15:37:34Z

/merge

github-actions Bot assigned csadorf Jun 3, 2025

csadorf added doc Documentation non-breaking Non-breaking change labels Jun 4, 2025

csadorf changed the title ~~Improve python/DEVELOPER_GUIDE.md~~ Improve Python Developer Documentation Jun 4, 2025

csadorf added 7 commits June 4, 2025 14:03

Revise section on memory mangement

11c96d7

Revise secton on Multi-GPU algorithms

63197fe

Revise section on async operations and stream ordering

509b74a

Remove Benchmarking Best Practices

bcec16c

Add section on API Matching policy to estimator guide

21091ff

Revise documentation on memory management in ESTIMATOR_GUIDE.md

8a35d47

- Use "accessibility" terminology to acknowledge that memory can be host or device accessible rather than relying on the actual physical location. - Remove reference to cuml-cpu related components.

Clarify policy on keyword-only arguments.

a1428ef

csadorf force-pushed the docs/improve-dev-docs branch from 2ec87bb to a1428ef Compare June 4, 2025 20:19

csadorf changed the title ~~Improve Python Developer Documentation~~ Documentation Updates for cuML Python Developer Guide Jun 4, 2025

csadorf added 4 commits June 4, 2025 15:54

Revise the intro sections of the Python dev guide

7cfc195

Remove outdated section on error handling

b13ea4f

Add ToC and reorder sections for more logical flow

a5bc38e

Merge branch 'branch-25.08' into docs/improve-dev-docs

ffca58b

csadorf commented Jun 4, 2025

View reviewed changes

csadorf marked this pull request as ready for review June 4, 2025 21:34

csadorf requested a review from a team as a code owner June 4, 2025 21:34

csadorf requested review from cjnolet and jcrist June 4, 2025 21:34

csadorf added 2 commits June 4, 2025 16:36

minor fixup

2543506

Fix section on log level configuration

c9604de

jcrist approved these changes Jun 4, 2025

View reviewed changes

rapids-bot Bot merged commit b603d65 into rapidsai:branch-25.08 Jun 5, 2025
70 checks passed

csadorf deleted the docs/improve-dev-docs branch June 5, 2025 15:37

csadorf mentioned this pull request Jun 5, 2025

Improve documentation on asynchronous execution #6859

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation Updates for cuML Python Developer Guide#6843

Documentation Updates for cuML Python Developer Guide#6843
rapids-bot[bot] merged 14 commits intorapidsai:branch-25.08from
csadorf:docs/improve-dev-docs

csadorf commented Jun 3, 2025 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 3, 2025

Uh oh!

csadorf Jun 4, 2025

Uh oh!

jcrist Jun 4, 2025

Uh oh!

betatim Jun 5, 2025 •

edited

Loading

Uh oh!

csadorf Jun 5, 2025

Uh oh!

jcrist Jun 5, 2025

Uh oh!

csadorf Jun 5, 2025

Uh oh!

csadorf Jun 5, 2025

Uh oh!

csadorf Jun 4, 2025

Uh oh!

jcrist left a comment

Uh oh!

jcrist Jun 4, 2025

Uh oh!

csadorf commented Jun 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

csadorf commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Core Documentation Updates

Estimator Guide Revisions

Memory Management Terminology

Uh oh!

copy-pr-bot Bot commented Jun 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

betatim Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcrist left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

csadorf commented Jun 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

csadorf commented Jun 3, 2025 •

edited

Loading

betatim Jun 5, 2025 •

edited

Loading