You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Overview
Major revision of cuML introduction and user guide documentation as well as the cuml.accel example notebooks
## Key Changes
### Documentation Overhaul
- **Complete revision of main pages**:
- `index.rst`: Complete revision with improved structure, mention of key performance metrics, quick start guide, and feature highlights
- `cuml_intro.rst`: Major restructuring around three core principles with detailed explanations and code examples
- `user_guide.rst`: Add reference to `cuml.accel` zero-code-change acceleration to avoid confusion on overview page
- `estimator_intro.ipynb`: Major revision of the estimator introduction user guide
- `pickling_cuml_models.ipynb`: Major revision of the serialization user guide including documenation of `as_sklearn`/`from_sklearn`
- `FIL.rst`: Major revision of the FIL documentation page
- **Expanded cuml.accel example notebooks**:
- `getting_started.ipynb` (481 lines): Added comprehensive guide covering classification, clustering, and dimensionality reduction with real-world datasets based on the Kaggle notebook
- `profiling.ipynb` (384 lines): Detailed profiling and debugging guide with function and line profiler examples
- `plot_kmeans_digits.ipynb`: Updated title for consistency
### Code Changes
- **Profiler styling support**: Added `CUML_ACCEL_PROFILER_STYLE` environment variable to control profiler appearance in different environments (essential for dark mode documentation rendering)
- **Configuration updates**: Updated `conf.py` to override default cuml.accel profiler style
Authors:
- Simon Adorf (https://github.com/csadorf)
- Jim Crist-Harif (https://github.com/jcrist)
Approvers:
- Jim Crist-Harif (https://github.com/jcrist)
URL: #7228
The Forest Inference Library is a subset of cuML designed to accelerate inference for tree-based models regardless of what framework they are trained on. FIL can accelerate XGBoost models, Scikit-Learn/cuML ``RandomForest`` models, LightGBM models, and any other model that can be converted to Treelite. An example invocation is shown below:
4
+
The Forest Inference Library (FIL) is a component of cuML, providing a
5
+
high-performance inference engine designed to accelerate tree-based machine
6
+
learning models on both GPU and CPU. FIL delivers significant speedups over
7
+
traditional CPU-based inference while maintaining compatibility with models
8
+
trained in popular frameworks.
9
+
10
+
**Key Benefits:**
11
+
12
+
- FIL typically offers a speedup of 80x or more over scikit-learn native execution
13
+
- Support for XGBoost, Scikit-Learn, LightGBM, and Treelite-compatible models
14
+
- Seamless GPU/CPU execution switching
15
+
- Built-in auto-optimization for maximum performance
16
+
- Advanced inference APIs for granular tree analysis
FIL includes built-in auto-optimization that automatically tunes performance hyperparameters for your specific model and batch size, eliminating the need for manual tuning in most cases:
FIL typically offers speedups of 80x or more relative to native inference with e.g. a Scikit-Learn ``RandomForest`` model on CPU.
54
+
The optimization process tests different memory layouts and chunk sizes to find the optimal configuration for your specific use case.
55
+
56
+
**Key Hyperparameters:**
57
+
58
+
- ``layout``: Determines the order in which tree nodes are arranged in memory (depth_first, layered, breadth_first)
59
+
- ``default_chunk_size``: Controls the granularity of parallelization during inference
60
+
- ``align_bytes``: Cache line alignment for optimal memory access patterns
61
+
62
+
**Manual Tuning:**
63
+
For advanced users, you can experiment with the ``align_bytes`` parameter. Its default value is typically close enough to optimal that it is not automatically searched during auto-optimization, but to squeeze the most performance possible out of FIL, try either 0 or 128 on GPU and 0 or 64 on CPU.
14
64
15
65
Optional CPU Execution
16
66
----------------------
17
67
While FIL offers the most benefit for large models and batch sizes by taking advantage of the speed and parallelism of NVIDIA GPUs, it can also be used to speed up inference on CPUs. This can be convenient for testing in environments without access to GPUs. It can also be useful for deployments which experience dramatic shifts in traffic. When the number of incoming inference requests is low, CPU execution can be used. When traffic spikes, the deployment can seamlessly scale up onto GPUs in order to handle the additional load as cheaply as possible without significantly increasing latency.
18
68
19
-
Optimizing Hyperparameters
20
-
--------------------------
21
-
FIL has a number of performance hyperparameters which can be used to get the maximum performance for a specific model and batch size. These can be tuned manually, but the built-in ``.optimize`` method makes it easy to quickly set those hyperparameters to the optimal value for a specific use case:
69
+
You can use FIL in CPU mode with a context manager:
22
70
23
71
.. code-block:: python
24
72
25
-
fil_model.optimize(batch_size=1_000_000)
26
-
output = fil_model.predict(input_data)
73
+
from cuml.fil import ForestInference, set_fil_device_type
27
74
28
-
This method will optimize the ``layout`` hyperparameter, which determines the order in which tree nodes are arranged in memory as well as ``default_chunk_size``, which determines the granularity of parallelization during inference.
Additionally, you may wish to experiment with the ``align_bytes`` parameter. Its default value is typically close enough to optimal that it is not automatically searched during auto-optimization, but to squeeze the most performance possible out of FIL, try either 0 or 128 on GPU and 0 or 64 on CPU.
79
+
Advanced Prediction APIs
80
+
-------------------------
81
+
FIL includes advanced prediction methods that provide granular information about individual trees in the ensemble, enabling novel ensembling techniques and analysis:
31
82
32
-
Deprecated ``load`` Parameters
33
-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
34
-
As of RAPIDS 25.04, the following hyperparameters accepted by the ``.load`` method of previous versions of FIL have been deprecated.
83
+
**Per-Tree Predictions**
84
+
The ``.predict_per_tree`` method returns the output of every single tree individually:
35
85
36
-
- ``threshold`` (will trigger a deprecation warning if used; pass to ``.predict`` instead)
37
-
- ``algo`` (ignored, but a warning will be logged)
38
-
- ``storage_type`` (ignored, but a warning will be logged)
39
-
- ``blocks_per_sm`` (ignored, but a warning will be logged)
40
-
- ``threads_per_tree`` (ignored, but a warning will be logged)
41
-
- ``n_items`` (ignored, but a warning will be logged)
42
-
- ``compute_shape_str`` (ignored, but a warning will be logged)
86
+
.. code-block:: python
43
87
44
-
New ``load`` Parameters
45
-
^^^^^^^^^^^^^^^^^^^^^^^
46
-
As of RAPIDS 25.04, the following new hyperparameters can be passed to the ``.load`` method
88
+
per_tree = fil_model.predict_per_tree(X)
89
+
mean = per_tree.mean(axis=1)
90
+
lower = np.percentile(per_tree, 10, axis=1)
91
+
upper = np.percentile(per_tree, 90, axis=1)
47
92
48
-
- ``layout``: Replaces the functionality of ``algo`` and specifies the in-memory layout of nodes in FIL forests. One of ``'depth_first'`` (default), ``'layered'`` or ``'breadth_first'``.
49
-
- ``align_bytes``: If specified, trees will be padded such that their in-memory size is a multiple of this value. This can sometimes improve performance by guaranteeing that memory reads from trees begin on a cache line boundary.
93
+
This enables advanced techniques like:
50
94
51
-
New Prediction Parameters
52
-
^^^^^^^^^^^^^^^^^^^^^^^^^
53
-
As of RAPIDS 25.04, all prediction methods accept a ``chunk_size`` parameter, which determines how batches are further subdivided for parallel processing. The optimal value depends on hardware, model, and batch size, and it is difficult to predict in advance. Typically, it is best to use the ``.optimize`` method to determine the best chunk size for a given batch size. If ``chunk_size`` must be set manually, the only general rule of thumb is that larger batch sizes generally benefit from larger chunk sizes. On GPU, ``chunk_size`` can be any power of 2 from 1 to 32. On CPU, ``chunk_size`` can be any power of 2, but values above 512 rarely offer any benefit.
95
+
- Weighted voting based on tree age, out-of-bag AUC, or data-drift scores
96
+
- Prediction intervals without bootstrapping
97
+
- Novel ensembling techniques with no retraining required
54
98
55
-
Additionally, ``threshold`` has been converted from a ``.load`` parameter to a ``.predict`` parameter.
99
+
**Leaf Node Analysis**
100
+
The ``.apply`` method returns the leaf node ID for every tree, enabling similarity analysis:
56
101
57
-
Extra Prediction Modes
58
-
----------------------
59
-
To gain additional insight on how models arrive at their inference decision, FIL now includes the ``.predict_per_tree`` and ``.apply`` methods. The first returns the output for every single tree in the ensemble individually. The second returns the ID of the leaf node obtained for every tree in the ensemble.
102
+
.. code-block:: python
103
+
104
+
leaf = fil_model.apply(X)
105
+
sim = (leaf[i] == leaf[j]).mean() # fraction of matching leaves
106
+
print(f"{sim:.0%} of trees agree on rows {i} & {j}")
107
+
108
+
This opens forest models to novel uses beyond straightforward regression or classification, such as measuring data similarity and understanding model behavior.
109
+
110
+
Use Cases
111
+
---------
112
+
FIL is ideal for many scenarios:
113
+
114
+
**High-Performance Applications:**
115
+
116
+
- User-facing APIs where every millisecond counts
117
+
- High-volume batch jobs (ad-click scoring, IoT analytics)
118
+
- Real-time inference with sub-10ms latency requirements
119
+
120
+
**Flexible Deployment:**
121
+
122
+
- Hybrid deployments - same model file, choose CPU or GPU at runtime
123
+
- Prototype locally and deploy to GPU-accelerated production servers
124
+
- Scale down to CPU-only machines during light traffic, scale up with GPUs during peak loads
60
125
61
-
Upcoming Changes
62
-
----------------
63
-
In RAPIDS 25.06, the shape of output arrays will change slightly for some models. Binary classifiers will return an array of solely the probabilities of the positive class for ``predict_proba`` calls. This both reduces memory requirements and improves performance. To convert to the old format, the following snippet can be used:
126
+
**Cost Optimization:**
127
+
128
+
- One GPU can replace CPUs with 50+ cores
129
+
- Significant cost reduction for high-throughput inference workloads
130
+
- Efficient resource utilization across different traffic patterns
131
+
132
+
**Advanced Analytics:**
133
+
134
+
- Novel ensembling techniques with per-tree analysis
135
+
- Data similarity measurement and model interpretability
136
+
- Prediction intervals and uncertainty quantification
137
+
138
+
API Reference
139
+
=============
140
+
141
+
See the :doc:`API reference <api>` for the API documentation.
142
+
143
+
Migration Guide
144
+
===============
145
+
146
+
FIL Redesign in RAPIDS 25.04
147
+
-----------------------------
148
+
FIL was completely redesigned in RAPIDS 25.04 with a new C++ implementation that provides significant performance improvements and new features:
149
+
150
+
**Key Changes in 25.04:**
151
+
152
+
- New C++ implementation for batched inference on GPU and CPU
153
+
- Built-in auto-optimization with ``.optimize()`` method
In RAPIDS 25.06, the shape of output arrays changed for some models. Binary classifiers now return an array of solely the probabilities of the positive class for ``predict_proba`` calls. This both reduces memory requirements and improves performance. To convert to the old format, the following snippet can be used:
64
163
65
164
.. code-block:: python
66
165
@@ -70,7 +169,7 @@ In RAPIDS 25.06, the shape of output arrays will change slightly for some models
70
169
# Starting in RAPIDS 25.06, the following can be used to obtain the old output shape
71
170
out = np.stack([1- out, out], axis=1)
72
171
73
-
Additionally, ``.predict`` calls will output two-dimensional arrays beginning in 25.06. This is in preparation for supporting multi-target regression and classification models. The old shape can be obtained via the following snippet:
172
+
Additionally, ``.predict`` calls now output two-dimensional arrays beginning in 25.06. This is in preparation for supporting multi-target regression and classification models. The old shape can be obtained via the following snippet:
74
173
75
174
.. code-block:: python
76
175
@@ -85,3 +184,55 @@ To use these new behaviors immediately, the ``ForestInference`` estimator can be
85
184
.. code-block:: python
86
185
87
186
from cuml.experimental.fil import ForestInference
187
+
188
+
Migration from RAPIDS 24.12 to 25.04
189
+
------------------------------------
190
+
191
+
**Before (RAPIDS 24.12):**
192
+
193
+
.. code-block:: python
194
+
195
+
fil_model = ForestInference.load(
196
+
"./model.ubj",
197
+
is_classifier=True,
198
+
algo='TREE_REORG', # Deprecated
199
+
threshold=0.5, # Now moved to predict()
200
+
storage_type='DENSE'# Deprecated
201
+
)
202
+
predictions = fil_model.predict(data)
203
+
204
+
**After (RAPIDS 25.04):**
205
+
206
+
.. code-block:: python
207
+
208
+
fil_model = ForestInference.load(
209
+
"./model.ubj",
210
+
is_classifier=True,
211
+
layout='depth_first'# New parameter
212
+
)
213
+
predictions = fil_model.predict(data, threshold=0.5) # threshold moved here
214
+
215
+
Deprecated ``load`` Parameters
216
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
217
+
As of RAPIDS 25.04, the following hyperparameters accepted by the ``.load`` method of previous versions of FIL have been deprecated.
218
+
219
+
- ``threshold`` (will trigger a deprecation warning if used; pass to ``.predict`` instead)
220
+
- ``algo`` (ignored, but a warning will be logged)
221
+
- ``storage_type`` (ignored, but a warning will be logged)
222
+
- ``blocks_per_sm`` (ignored, but a warning will be logged)
223
+
- ``threads_per_tree`` (ignored, but a warning will be logged)
224
+
- ``n_items`` (ignored, but a warning will be logged)
225
+
- ``compute_shape_str`` (ignored, but a warning will be logged)
226
+
227
+
New ``load`` Parameters
228
+
^^^^^^^^^^^^^^^^^^^^^^^
229
+
As of RAPIDS 25.04, the following new hyperparameters can be passed to the ``.load`` method
230
+
231
+
- ``layout``: Replaces the functionality of ``algo`` and specifies the in-memory layout of nodes in FIL forests. One of ``'depth_first'`` (default), ``'layered'`` or ``'breadth_first'``.
232
+
- ``align_bytes``: If specified, trees will be padded such that their in-memory size is a multiple of this value. This can sometimes improve performance by guaranteeing that memory reads from trees begin on a cache line boundary.
233
+
234
+
New Prediction Parameters
235
+
^^^^^^^^^^^^^^^^^^^^^^^^^
236
+
As of RAPIDS 25.04, all prediction methods accept a ``chunk_size`` parameter, which determines how batches are further subdivided for parallel processing. The optimal value depends on hardware, model, and batch size, and it is difficult to predict in advance. Typically, it is best to use the ``.optimize`` method to determine the best chunk size for a given batch size. If ``chunk_size`` must be set manually, the only general rule of thumb is that larger batch sizes generally benefit from larger chunk sizes. On GPU, ``chunk_size`` can be any power of 2 from 1 to 32. On CPU, ``chunk_size`` can be any power of 2, but values above 512 rarely offer any benefit.
237
+
238
+
Additionally, ``threshold`` has been converted from a ``.load`` parameter to a ``.predict`` parameter.
0 commit comments