Skip to content
Merged
Show file tree
Hide file tree
Changes from 54 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
dddf841
Expand the cuML docs landing page.
csadorf Sep 16, 2025
326197a
revert this: reduce PR CI to what's needed for docs build
csadorf Sep 17, 2025
43294ef
remove open-source note
csadorf Sep 17, 2025
fdd6355
expand on system reqs
csadorf Sep 17, 2025
07f5103
minor fixups on the cuml intro page
csadorf Sep 17, 2025
1fef01b
improve the intro page
csadorf Sep 17, 2025
59f5f90
minor fixups to the notebooks
csadorf Sep 17, 2025
d3bfac7
use more consistent naming scheme and imports in the notebooks
csadorf Sep 17, 2025
3d8e2e3
comment out line that is supposed to be commented out
csadorf Sep 17, 2025
9bed046
apply black formatting to notebooks where sensible
csadorf Sep 17, 2025
1891952
do not compute accuracies and scores two times
csadorf Sep 17, 2025
284ab05
Improve code formatting and comments in notebooks.
csadorf Sep 17, 2025
89e03fe
revert this: temp disable kmeans due intermittent hangs
csadorf Sep 19, 2025
0dae41c
add joblib section
csadorf Sep 19, 2025
69a9022
recommend pickle protocol 5
csadorf Sep 19, 2025
44f6c50
slightly revise the joblib section
csadorf Sep 19, 2025
a9fbbb6
improve the intro on distributed model pickling
csadorf Sep 19, 2025
4cdb3a1
move the kmeans cell and update rf intro
csadorf Sep 19, 2025
272907b
Add docs on as_sklearn/from_sklearn.
csadorf Sep 19, 2025
1128620
fix a linter warning
csadorf Sep 22, 2025
e4cbca5
Re-enable distributed Kmeans
csadorf Sep 23, 2025
b00c413
minor polishing
csadorf Sep 23, 2025
7809da7
More coherent estimator variable naming in pickling notebook
csadorf Sep 23, 2025
da62db3
Do not filter FutureWarning in pickle notebook.
csadorf Sep 23, 2025
e1dae3f
Improve clarity on the design principles.
csadorf Sep 23, 2025
a20e12e
Hide ToC on index page
csadorf Sep 24, 2025
6b28ada
Add note on cuml.accel support for lists and tuples
csadorf Sep 24, 2025
1f3762d
Fix links on the intro page.
csadorf Sep 24, 2025
fdace6f
Provide short intro paragraph to User Guide section
csadorf Sep 24, 2025
e64bb04
Fix the Classification section header
csadorf Sep 24, 2025
fe0fb53
Fix link in pickling notebook
csadorf Sep 24, 2025
090961e
Add note to user guide about using cuml.accel for zero code change ac…
csadorf Sep 24, 2025
966a300
add draft for the cuml accel notebook
csadorf Sep 24, 2025
86b7117
Allow to override cuml.accel default profiler style.
csadorf Sep 25, 2025
0eddc20
Expand on cuml.accel example notebooks.
csadorf Sep 25, 2025
768f7a2
Rename cuml.accel example notebooks.
csadorf Sep 25, 2025
b929c10
Improve cuml.accel example titles
csadorf Sep 25, 2025
ec57bc0
Merge remote-tracking branch 'origin/branch-25.10' into docs/issue-7096
csadorf Sep 25, 2025
e513db1
Restore full CI.
csadorf Sep 25, 2025
a596369
Revise intro paragraphs on landing page.
csadorf Sep 26, 2025
69ef1b9
Improve generic code comment on intro page
csadorf Sep 26, 2025
817c28b
Revise the intro api example code.
csadorf Sep 26, 2025
ef77e33
Revise note on api compatibility on intro page.
csadorf Sep 26, 2025
61b3176
Improve the "be fast" intro section.
csadorf Sep 26, 2025
10a20b2
Revise the estimator intro notebook.
csadorf Sep 26, 2025
cb1e540
Revise the pickling notebook.
csadorf Sep 26, 2025
0189847
update pickling notebook metadata and make executable
csadorf Sep 26, 2025
0b8ba7c
Further revise the pickling notebook
csadorf Sep 26, 2025
0184eaa
Refactor code examples in estimator intro notebook.
csadorf Sep 26, 2025
c455296
explain n_parts choice
csadorf Sep 26, 2025
6169f17
do not persist dask input data
csadorf Sep 26, 2025
e885a51
Revise the FIL landing page.
csadorf Sep 26, 2025
2717132
Merge branch 'branch-25.10' into docs/issue-7096
csadorf Sep 29, 2025
1ed88ad
Improve the cuml.accel getting started notebook.
csadorf Sep 29, 2025
9e8c458
Improve default profiler theme
jcrist Sep 30, 2025
5e390ab
Tweak css
jcrist Sep 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
227 changes: 189 additions & 38 deletions docs/source/FIL.rst
Original file line number Diff line number Diff line change
@@ -1,66 +1,165 @@
FIL - RAPIDS Forest Inference Library
=====================================

The Forest Inference Library is a subset of cuML designed to accelerate inference for tree-based models regardless of what framework they are trained on. FIL can accelerate XGBoost models, Scikit-Learn/cuML ``RandomForest`` models, LightGBM models, and any other model that can be converted to Treelite. An example invocation is shown below:
The Forest Inference Library (FIL) is a component of cuML, providing a
high-performance inference engine designed to accelerate tree-based machine
learning models on both GPU and CPU. FIL delivers significant speedups over
traditional CPU-based inference while maintaining compatibility with models
trained in popular frameworks.

**Key Benefits:**

- FIL typically offers a speedup of 80x or more over scikit-learn native execution
- Support for XGBoost, Scikit-Learn, LightGBM, and Treelite-compatible models
- Seamless GPU/CPU execution switching
- Built-in auto-optimization for maximum performance
- Advanced inference APIs for granular tree analysis

**Quick Start:**

.. code-block:: python

import xgboost as xgb
import numpy as np
from cuml.fil import ForestInference

# Train your model as usual and save it
xgb_model = xgb.XGBClassifier()
xgb_model.fit(X_train, y_train)
xgb_model.save_model("xgb_model.ubj")

# Load into FIL and auto-tune for your batch size
fil_model = ForestInference.load("xgb_model.ubj", is_classifier=True)
fil_model.optimize(batch_size=1024)

# Now you can predict with FIL directly
predictions = fil_model.predict(X_test)
probabilities = fil_model.predict_proba(X_test)

Performance Optimization
-------------------------
FIL includes built-in auto-optimization that automatically tunes performance hyperparameters for your specific model and batch size, eliminating the need for manual tuning in most cases:

.. code-block:: python

from cuml import ForestInference
fil_model = ForestInference.load("model.ubj", is_classifier=True)
fil_model.optimize(batch_size=1_000_000)

# Check which hyperparameters were selected
print(f"Layout: {fil_model.layout}")
print(f"Chunk size: {fil_model.default_chunk_size}")

fil_model = ForestInference.load("./my_xgboost_classifier.ubj", is_classifier=True)
class_predictions = fil_model.predict(input_data)
result = fil_model.predict(data)

FIL typically offers speedups of 80x or more relative to native inference with e.g. a Scikit-Learn ``RandomForest`` model on CPU.
The optimization process tests different memory layouts and chunk sizes to find the optimal configuration for your specific use case.

**Key Hyperparameters:**

- ``layout``: Determines the order in which tree nodes are arranged in memory (depth_first, layered, breadth_first)
- ``default_chunk_size``: Controls the granularity of parallelization during inference
- ``align_bytes``: Cache line alignment for optimal memory access patterns

**Manual Tuning:**
For advanced users, you can experiment with the ``align_bytes`` parameter. Its default value is typically close enough to optimal that it is not automatically searched during auto-optimization, but to squeeze the most performance possible out of FIL, try either 0 or 128 on GPU and 0 or 64 on CPU.

Optional CPU Execution
----------------------
While FIL offers the most benefit for large models and batch sizes by taking advantage of the speed and parallelism of NVIDIA GPUs, it can also be used to speed up inference on CPUs. This can be convenient for testing in environments without access to GPUs. It can also be useful for deployments which experience dramatic shifts in traffic. When the number of incoming inference requests is low, CPU execution can be used. When traffic spikes, the deployment can seamlessly scale up onto GPUs in order to handle the additional load as cheaply as possible without significantly increasing latency.

Optimizing Hyperparameters
--------------------------
FIL has a number of performance hyperparameters which can be used to get the maximum performance for a specific model and batch size. These can be tuned manually, but the built-in ``.optimize`` method makes it easy to quickly set those hyperparameters to the optimal value for a specific use case:
You can use FIL in CPU mode with a context manager:

.. code-block:: python

fil_model.optimize(batch_size=1_000_000)
output = fil_model.predict(input_data)
from cuml.fil import ForestInference, set_fil_device_type

This method will optimize the ``layout`` hyperparameter, which determines the order in which tree nodes are arranged in memory as well as ``default_chunk_size``, which determines the granularity of parallelization during inference.
with set_fil_device_type("cpu"):
fil_model = ForestInference.load("xgboost_model.ubj")
result = fil_model.predict(data)

Additionally, you may wish to experiment with the ``align_bytes`` parameter. Its default value is typically close enough to optimal that it is not automatically searched during auto-optimization, but to squeeze the most performance possible out of FIL, try either 0 or 128 on GPU and 0 or 64 on CPU.
Advanced Prediction APIs
-------------------------
FIL includes advanced prediction methods that provide granular information about individual trees in the ensemble, enabling novel ensembling techniques and analysis:

Deprecated ``load`` Parameters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As of RAPIDS 25.04, the following hyperparameters accepted by the ``.load`` method of previous versions of FIL have been deprecated.
**Per-Tree Predictions**
The ``.predict_per_tree`` method returns the output of every single tree individually:

- ``threshold`` (will trigger a deprecation warning if used; pass to ``.predict`` instead)
- ``algo`` (ignored, but a warning will be logged)
- ``storage_type`` (ignored, but a warning will be logged)
- ``blocks_per_sm`` (ignored, but a warning will be logged)
- ``threads_per_tree`` (ignored, but a warning will be logged)
- ``n_items`` (ignored, but a warning will be logged)
- ``compute_shape_str`` (ignored, but a warning will be logged)
.. code-block:: python

New ``load`` Parameters
^^^^^^^^^^^^^^^^^^^^^^^
As of RAPIDS 25.04, the following new hyperparameters can be passed to the ``.load`` method
per_tree = fil_model.predict_per_tree(X)
mean = per_tree.mean(axis=1)
lower = np.percentile(per_tree, 10, axis=1)
upper = np.percentile(per_tree, 90, axis=1)

- ``layout``: Replaces the functionality of ``algo`` and specifies the in-memory layout of nodes in FIL forests. One of ``'depth_first'`` (default), ``'layered'`` or ``'breadth_first'``.
- ``align_bytes``: If specified, trees will be padded such that their in-memory size is a multiple of this value. This can sometimes improve performance by guaranteeing that memory reads from trees begin on a cache line boundary.
This enables advanced techniques like:

New Prediction Parameters
^^^^^^^^^^^^^^^^^^^^^^^^^
As of RAPIDS 25.04, all prediction methods accept a ``chunk_size`` parameter, which determines how batches are further subdivided for parallel processing. The optimal value depends on hardware, model, and batch size, and it is difficult to predict in advance. Typically, it is best to use the ``.optimize`` method to determine the best chunk size for a given batch size. If ``chunk_size`` must be set manually, the only general rule of thumb is that larger batch sizes generally benefit from larger chunk sizes. On GPU, ``chunk_size`` can be any power of 2 from 1 to 32. On CPU, ``chunk_size`` can be any power of 2, but values above 512 rarely offer any benefit.
- Weighted voting based on tree age, out-of-bag AUC, or data-drift scores
- Prediction intervals without bootstrapping
- Novel ensembling techniques with no retraining required

Additionally, ``threshold`` has been converted from a ``.load`` parameter to a ``.predict`` parameter.
**Leaf Node Analysis**
The ``.apply`` method returns the leaf node ID for every tree, enabling similarity analysis:

Extra Prediction Modes
----------------------
To gain additional insight on how models arrive at their inference decision, FIL now includes the ``.predict_per_tree`` and ``.apply`` methods. The first returns the output for every single tree in the ensemble individually. The second returns the ID of the leaf node obtained for every tree in the ensemble.
.. code-block:: python

leaf = fil_model.apply(X)
sim = (leaf[i] == leaf[j]).mean() # fraction of matching leaves
print(f"{sim:.0%} of trees agree on rows {i} & {j}")

This opens forest models to novel uses beyond straightforward regression or classification, such as measuring data similarity and understanding model behavior.

Use Cases
---------
FIL is ideal for many scenarios:

**High-Performance Applications:**

- User-facing APIs where every millisecond counts
- High-volume batch jobs (ad-click scoring, IoT analytics)
- Real-time inference with sub-10ms latency requirements

**Flexible Deployment:**

- Hybrid deployments - same model file, choose CPU or GPU at runtime
- Prototype locally and deploy to GPU-accelerated production servers
- Scale down to CPU-only machines during light traffic, scale up with GPUs during peak loads

Upcoming Changes
----------------
In RAPIDS 25.06, the shape of output arrays will change slightly for some models. Binary classifiers will return an array of solely the probabilities of the positive class for ``predict_proba`` calls. This both reduces memory requirements and improves performance. To convert to the old format, the following snippet can be used:
**Cost Optimization:**

- One GPU can replace CPUs with 50+ cores
- Significant cost reduction for high-throughput inference workloads
- Efficient resource utilization across different traffic patterns

**Advanced Analytics:**

- Novel ensembling techniques with per-tree analysis
- Data similarity measurement and model interpretability
- Prediction intervals and uncertainty quantification

API Reference
=============

See the :doc:`API reference <api>` for the API documentation.

Migration Guide
===============

FIL Redesign in RAPIDS 25.04
-----------------------------
FIL was completely redesigned in RAPIDS 25.04 with a new C++ implementation that provides significant performance improvements and new features:

**Key Changes in 25.04:**

- New C++ implementation for batched inference on GPU and CPU
- Built-in auto-optimization with ``.optimize()`` method
- Advanced inference APIs (``.predict_per_tree``, ``.apply``)
- Up to 4x faster GPU throughput than previous versions
- Enhanced memory layouts and cache optimization
- New parameter structure (``layout``, ``align_bytes``)
- Moved ``threshold`` from ``.load()`` to ``.predict()``

Migration from RAPIDS 25.04 to 25.06 (Output Shape Changes)
-----------------------------------------------------------
In RAPIDS 25.06, the shape of output arrays changed for some models. Binary classifiers now return an array of solely the probabilities of the positive class for ``predict_proba`` calls. This both reduces memory requirements and improves performance. To convert to the old format, the following snippet can be used:

.. code-block:: python

Expand All @@ -70,7 +169,7 @@ In RAPIDS 25.06, the shape of output arrays will change slightly for some models
# Starting in RAPIDS 25.06, the following can be used to obtain the old output shape
out = np.stack([1 - out, out], axis=1)

Additionally, ``.predict`` calls will output two-dimensional arrays beginning in 25.06. This is in preparation for supporting multi-target regression and classification models. The old shape can be obtained via the following snippet:
Additionally, ``.predict`` calls now output two-dimensional arrays beginning in 25.06. This is in preparation for supporting multi-target regression and classification models. The old shape can be obtained via the following snippet:

.. code-block:: python

Expand All @@ -85,3 +184,55 @@ To use these new behaviors immediately, the ``ForestInference`` estimator can be
.. code-block:: python

from cuml.experimental.fil import ForestInference

Migration from RAPIDS 24.12 to 25.04
------------------------------------

**Before (RAPIDS 24.12):**

.. code-block:: python

fil_model = ForestInference.load(
"./model.ubj",
is_classifier=True,
algo='TREE_REORG', # Deprecated
threshold=0.5, # Now moved to predict()
storage_type='DENSE' # Deprecated
)
predictions = fil_model.predict(data)

**After (RAPIDS 25.04):**

.. code-block:: python

fil_model = ForestInference.load(
"./model.ubj",
is_classifier=True,
layout='depth_first' # New parameter
)
predictions = fil_model.predict(data, threshold=0.5) # threshold moved here

Deprecated ``load`` Parameters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As of RAPIDS 25.04, the following hyperparameters accepted by the ``.load`` method of previous versions of FIL have been deprecated.

- ``threshold`` (will trigger a deprecation warning if used; pass to ``.predict`` instead)
- ``algo`` (ignored, but a warning will be logged)
- ``storage_type`` (ignored, but a warning will be logged)
- ``blocks_per_sm`` (ignored, but a warning will be logged)
- ``threads_per_tree`` (ignored, but a warning will be logged)
- ``n_items`` (ignored, but a warning will be logged)
- ``compute_shape_str`` (ignored, but a warning will be logged)

New ``load`` Parameters
^^^^^^^^^^^^^^^^^^^^^^^
As of RAPIDS 25.04, the following new hyperparameters can be passed to the ``.load`` method

- ``layout``: Replaces the functionality of ``algo`` and specifies the in-memory layout of nodes in FIL forests. One of ``'depth_first'`` (default), ``'layered'`` or ``'breadth_first'``.
- ``align_bytes``: If specified, trees will be padded such that their in-memory size is a multiple of this value. This can sometimes improve performance by guaranteeing that memory reads from trees begin on a cache line boundary.

New Prediction Parameters
^^^^^^^^^^^^^^^^^^^^^^^^^
As of RAPIDS 25.04, all prediction methods accept a ``chunk_size`` parameter, which determines how batches are further subdivided for parallel processing. The optimal value depends on hardware, model, and batch size, and it is difficult to predict in advance. Typically, it is best to use the ``.optimize`` method to determine the best chunk size for a given batch size. If ``chunk_size`` must be set manually, the only general rule of thumb is that larger batch sizes generally benefit from larger chunk sizes. On GPU, ``chunk_size`` can be any power of 2 from 1 to 32. On CPU, ``chunk_size`` can be any power of 2, but values above 512 rarely offer any benefit.

Additionally, ``threshold`` has been converted from a ``.load`` parameter to a ``.predict`` parameter.
4 changes: 4 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,10 @@ def setup_redirects(app, docname):


def setup(app):
# Override the default cuml.accel profiler style to support both
# dark and light mode in rendered jupyter notebooks.
os.environ["CUML_ACCEL_PROFILER_STYLE"] = "#333333 on #ffffff"

app.add_css_file("references.css")
app.add_css_file("https://docs.rapids.ai/assets/css/custom.css")
app.add_js_file("https://docs.rapids.ai/assets/js/custom.js", loading_method="defer")
Expand Down
Loading