Revise the cuML docs for 25.10#7228
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test 26e948f |
26e948f to
2ca5ab2
Compare
|
/ok to test 2ca5ab2 |
|
/ok to test 1f11be6 |
1f11be6 to
b3dcbf7
Compare
|
/ok to test b3dcbf7 |
b3dcbf7 to
db63511
Compare
|
/ok to test db63511 |
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
6b054f8 to
2568644
Compare
|
/ok to test 94315c3 |
betatim
left a comment
There was a problem hiding this comment.
First pass. Generally looks good.
Did you mean to change the profiler stuff or left over code?
Yes, I'm motivating this in the PR description. |
Improve clarity and consistency, replacing variable assignments with direct values and standardizing prediction variable names.
trivialfis
left a comment
There was a problem hiding this comment.
The new doc looks very nice!
A few comments:
- You can use syntax like ":py:meth
xgboost.ForestInference.apply" to add a link to the API reference. - Not sure why
ForestInference.loadneeds ais_classifierfor XGB models, it should be possible to infer this from the use of objective function.
| "source": [ | ||
| "kde = KernelDensity(kernel='gaussian', bandwidth=0.5)\n", | ||
| "kde.fit(X)\n" | ||
| ] |
There was a problem hiding this comment.
I don't know if the example proves anything to the user here since the heavy lifting is done in the background. Maybe showing that GPU inputs (like cuPY arrays) are correctly ingested might be better.
There was a problem hiding this comment.
True, I've adjusted the example in ae8804a5a014834096a37bf2edc4e361a42ee185 .
|
Thanks for all the fixes Simon! |
jcrist
left a comment
There was a problem hiding this comment.
A few comments, but mostly LGTM!
| @@ -0,0 +1,384 @@ | |||
| { | |||
There was a problem hiding this comment.
Can you comment on why you think we need an example notebook for this? From my read of this this seems fairly duplicative of the existing docs on profiling and logging (and much of this notebook feels directly lifted from there).
I'd rather avoid duplicating content in multiple places. Having multiple pages on the same thing means updates need to happen in multiple places, and also doesn't leave a canonically clear page to refer users to. IMO we should drop this notebook entirely.
There was a problem hiding this comment.
We can certainly work on de-duplication, but there is value in having the same content presented in different ways. And I think it's perfectly fine if that means that there is some duplication. The notebooks are not only rendered as part of our docs, but can also be downloaded and directly executed. Having fully-functional examples (i.e. tutorial-style content) is different from an overview guide or reference documentation, because they serve different purposes.
To be clear, I am not claiming that the current duplication or distribution of content is optimal, but I do not consider full deduplication a critical factor in documentation.
|
|
||
| console = Console() | ||
|
|
||
| base_style = Style.parse(os.getenv("CUML_ACCEL_PROFILER_STYLE", "")) |
There was a problem hiding this comment.
In my local testing the profiler renders fine in a dark and light theme as is (at least using the default themes that ship with jupyter). That said, if our docs dark theme doesn't work for it I'd rather adjust our default theme to work better across all environments than special case this.
If we decide to keep the profiling notebook, mind if I push up a fix that adjusts the style handling here?
There was a problem hiding this comment.
Yes, I'd prefer to keep the notebook. Feel free to create a PR into mine with an adjustment to the default theme, but I'd prefer if you did not directly push to this branch.
Edit: I changed my mind on this, feel free to push directly to this branch.
There was a problem hiding this comment.
I've pushed an update. This required two changes:
- We now use a simpler color scheme for highlighting code in the line profiler. This palette works much better across light and dark themes, and generally mirrors the one used by jupyter.
- We squash a css tweak added by
pydata-sphinx-themethat sets a background to html output cells in dark mode notebooks. This doesn't seem beneficial in any of our example notebooks, and was the main source of things not rendering nicely. Both the sklearn rich reprs and our profiler output cells now look much nicer in the rendered notebooks.
I've inspected the outputs of these changes in both light and dark terminals, light and dark notebooks, and all rendered notebooks in our docs in both light and dark mode. I think this change is strictly beneficial.
There was a problem hiding this comment.
Looks perfect in the preview. Awesome!
csadorf
left a comment
There was a problem hiding this comment.
@viclafargue Thanks for the feedback. I've addressed your comments in 1ed88ad .
| "source": [ | ||
| "kde = KernelDensity(kernel='gaussian', bandwidth=0.5)\n", | ||
| "kde.fit(X)\n" | ||
| ] |
There was a problem hiding this comment.
True, I've adjusted the example in ae8804a5a014834096a37bf2edc4e361a42ee185 .
Thanks! :)
Are you referring to a specific instance where we are not doing that or is that just a general suggestion?
Good question, but as of right now it is needed. Maybe something that we can improve in a future API revision? CC @hcho3 |
The new theme works well in dark and light environments in both consoles and notebooks.
|
/merge |
Overview
Major revision of cuML introduction and user guide documentation as well as the cuml.accel example notebooks
Key Changes
Documentation Overhaul
Complete revision of main pages:
index.rst: Complete revision with improved structure, mention of key performance metrics, quick start guide, and feature highlightscuml_intro.rst: Major restructuring around three core principles with detailed explanations and code examplesuser_guide.rst: Add reference tocuml.accelzero-code-change acceleration to avoid confusion on overview pageestimator_intro.ipynb: Major revision of the estimator introduction user guidepickling_cuml_models.ipynb: Major revision of the serialization user guide including documenation ofas_sklearn/from_sklearnFIL.rst: Major revision of the FIL documentation pageExpanded cuml.accel example notebooks:
getting_started.ipynb(481 lines): Added comprehensive guide covering classification, clustering, and dimensionality reduction with real-world datasets based on the Kaggle notebookprofiling.ipynb(384 lines): Detailed profiling and debugging guide with function and line profiler examplesplot_kmeans_digits.ipynb: Updated title for consistencyCode Changes
CUML_ACCEL_PROFILER_STYLEenvironment variable to control profiler appearance in different environments (essential for dark mode documentation rendering)conf.pyto override default cuml.accel profiler style