Skip to content

Revise the cuML docs for 25.10#7228

Merged
rapids-bot[bot] merged 56 commits intorapidsai:branch-25.10from
csadorf:docs/issue-7096
Sep 30, 2025
Merged

Revise the cuML docs for 25.10#7228
rapids-bot[bot] merged 56 commits intorapidsai:branch-25.10from
csadorf:docs/issue-7096

Conversation

@csadorf
Copy link
Copy Markdown
Contributor

@csadorf csadorf commented Sep 16, 2025

Overview

Major revision of cuML introduction and user guide documentation as well as the cuml.accel example notebooks

Key Changes

Documentation Overhaul

  • Complete revision of main pages:

    • index.rst: Complete revision with improved structure, mention of key performance metrics, quick start guide, and feature highlights
    • cuml_intro.rst: Major restructuring around three core principles with detailed explanations and code examples
    • user_guide.rst: Add reference to cuml.accel zero-code-change acceleration to avoid confusion on overview page
    • estimator_intro.ipynb: Major revision of the estimator introduction user guide
    • pickling_cuml_models.ipynb: Major revision of the serialization user guide including documenation of as_sklearn/from_sklearn
    • FIL.rst: Major revision of the FIL documentation page
  • Expanded cuml.accel example notebooks:

    • getting_started.ipynb (481 lines): Added comprehensive guide covering classification, clustering, and dimensionality reduction with real-world datasets based on the Kaggle notebook
    • profiling.ipynb (384 lines): Detailed profiling and debugging guide with function and line profiler examples
    • plot_kmeans_digits.ipynb: Updated title for consistency

Code Changes

  • Profiler styling support: Added CUML_ACCEL_PROFILER_STYLE environment variable to control profiler appearance in different environments (essential for dark mode documentation rendering)
  • Configuration updates: Updated conf.py to override default cuml.accel profiler style

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Sep 16, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@csadorf csadorf added doc Documentation non-breaking Non-breaking change labels Sep 16, 2025
@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Sep 16, 2025

/ok to test 26e948f

@csadorf csadorf changed the title Expand the cuML docs landing page. [DO NOT MERGE] Improve the cuML docs Sep 16, 2025
@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Sep 17, 2025

/ok to test 2ca5ab2

@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Sep 17, 2025

/ok to test 1f11be6

@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Sep 17, 2025

/ok to test b3dcbf7

@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Sep 17, 2025

/ok to test db63511

@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Sep 19, 2025

/ok to test 94315c3

Comment thread docs/source/cuml_intro.rst
Comment thread docs/source/cuml_intro.rst
@csadorf csadorf changed the title [DO NOT MERGE] Improve the cuML docs Improve the cuML docs Sep 24, 2025
Copy link
Copy Markdown
Member

@betatim betatim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass. Generally looks good.

Did you mean to change the profiler stuff or left over code?

Comment thread docs/source/index.rst Outdated
Comment thread docs/source/cuml_intro.rst Outdated
Comment thread docs/source/cuml_intro.rst Outdated
Comment thread docs/source/cuml_intro.rst Outdated
Comment thread docs/source/cuml_intro.rst
Comment thread docs/source/estimator_intro.ipynb
Comment thread docs/source/pickling_cuml_models.ipynb
Comment thread docs/source/pickling_cuml_models.ipynb
Comment thread docs/source/pickling_cuml_models.ipynb
Comment thread docs/source/pickling_cuml_models.ipynb
@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Sep 26, 2025

Did you mean to change the profiler stuff or left over code?

Yes, I'm motivating this in the PR description.

Copy link
Copy Markdown
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new doc looks very nice!

A few comments:

  • You can use syntax like ":py:methxgboost.ForestInference.apply" to add a link to the API reference.
  • Not sure why ForestInference.load needs a is_classifier for XGB models, it should be possible to infer this from the use of objective function.

Copy link
Copy Markdown
Contributor

@viclafargue viclafargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Just have a few suggestions for the getting started cuml-accel notebook.

Comment thread docs/source/cuml-accel/examples/getting_started.ipynb Outdated
Comment thread docs/source/cuml-accel/examples/getting_started.ipynb Outdated
Comment thread docs/source/cuml-accel/examples/getting_started.ipynb
Comment on lines +434 to +437
"source": [
"kde = KernelDensity(kernel='gaussian', bandwidth=0.5)\n",
"kde.fit(X)\n"
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if the example proves anything to the user here since the heavy lifting is done in the background. Maybe showing that GPU inputs (like cuPY arrays) are correctly ingested might be better.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I've adjusted the example in ae8804a5a014834096a37bf2edc4e361a42ee185 .

@betatim
Copy link
Copy Markdown
Member

betatim commented Sep 29, 2025

Thanks for all the fixes Simon!

Copy link
Copy Markdown
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, but mostly LGTM!

@@ -0,0 +1,384 @@
{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment on why you think we need an example notebook for this? From my read of this this seems fairly duplicative of the existing docs on profiling and logging (and much of this notebook feels directly lifted from there).

I'd rather avoid duplicating content in multiple places. Having multiple pages on the same thing means updates need to happen in multiple places, and also doesn't leave a canonically clear page to refer users to. IMO we should drop this notebook entirely.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can certainly work on de-duplication, but there is value in having the same content presented in different ways. And I think it's perfectly fine if that means that there is some duplication. The notebooks are not only rendered as part of our docs, but can also be downloaded and directly executed. Having fully-functional examples (i.e. tutorial-style content) is different from an overview guide or reference documentation, because they serve different purposes.

To be clear, I am not claiming that the current duplication or distribution of content is optimal, but I do not consider full deduplication a critical factor in documentation.

Comment thread python/cuml/cuml/accel/profilers.py Outdated

console = Console()

base_style = Style.parse(os.getenv("CUML_ACCEL_PROFILER_STYLE", ""))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my local testing the profiler renders fine in a dark and light theme as is (at least using the default themes that ship with jupyter). That said, if our docs dark theme doesn't work for it I'd rather adjust our default theme to work better across all environments than special case this.

If we decide to keep the profiling notebook, mind if I push up a fix that adjusts the style handling here?

Copy link
Copy Markdown
Contributor Author

@csadorf csadorf Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd prefer to keep the notebook. Feel free to create a PR into mine with an adjustment to the default theme, but I'd prefer if you did not directly push to this branch.

Edit: I changed my mind on this, feel free to push directly to this branch.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed an update. This required two changes:

  • We now use a simpler color scheme for highlighting code in the line profiler. This palette works much better across light and dark themes, and generally mirrors the one used by jupyter.
  • We squash a css tweak added by pydata-sphinx-theme that sets a background to html output cells in dark mode notebooks. This doesn't seem beneficial in any of our example notebooks, and was the main source of things not rendering nicely. Both the sklearn rich reprs and our profiler output cells now look much nicer in the rendered notebooks.

I've inspected the outputs of these changes in both light and dark terminals, light and dark notebooks, and all rendered notebooks in our docs in both light and dark mode. I think this change is strictly beneficial.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks perfect in the preview. Awesome!

Copy link
Copy Markdown
Contributor Author

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viclafargue Thanks for the feedback. I've addressed your comments in 1ed88ad .

Comment on lines +434 to +437
"source": [
"kde = KernelDensity(kernel='gaussian', bandwidth=0.5)\n",
"kde.fit(X)\n"
]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I've adjusted the example in ae8804a5a014834096a37bf2edc4e361a42ee185 .

Comment thread docs/source/cuml-accel/examples/getting_started.ipynb Outdated
Comment thread docs/source/cuml-accel/examples/getting_started.ipynb Outdated
@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Sep 30, 2025

The new doc looks very nice!

Thanks! :)

A few comments:

  • You can use syntax like ":py:methxgboost.ForestInference.apply" to add a link to the API reference.

Are you referring to a specific instance where we are not doing that or is that just a general suggestion?

  • Not sure why ForestInference.load needs a is_classifier for XGB models, it should be possible to infer this from the use of objective function.

Good question, but as of right now it is needed. Maybe something that we can improve in a future API revision? CC @hcho3

The new theme works well in dark and light environments in both consoles
and notebooks.
This overrides an override set by pydata-sphinx-theme to avoid adding a
background to html outputs in dark theme.
Copy link
Copy Markdown
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Sep 30, 2025

/merge

@rapids-bot rapids-bot Bot merged commit e5adc43 into rapidsai:branch-25.10 Sep 30, 2025
101 checks passed
@csadorf csadorf deleted the docs/issue-7096 branch October 1, 2025 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cython / Python Cython or Python issue doc Documentation non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add docs page on sklearn interop Provided dedicated notebooks for cuml.accel Improve the cuml.accel documentation

7 participants