|
1 | 1 | # Experimental FIL - RAPIDS Forest Inference Library |
2 | 2 |
|
3 | | -This experimental feature offers a new implementation of cuML's existing |
4 | | -Forest Inference Library. The primary advantages of this new |
5 | | -implementation are: |
6 | | - |
7 | | -1. Models can now be evaluated on CPU in addition to GPU. |
8 | | -2. Faster GPU execution on some models and hardware. |
9 | | -3. Support for a wider range of Treelite's available model parameters. |
10 | | - |
11 | | -In addition, there are a few limitations of this implementation, |
12 | | -including: |
13 | | - |
14 | | -1. Models with shallow trees (depth 2-4) typically execute slower than with |
15 | | - existing FIL. |
16 | | -2. This implementation has not been as exhaustively tested as the existing |
17 | | - FIL. |
18 | | - |
19 | | -If you need to absolutely maximize runtime performance, it is |
20 | | -recommended that you test both the new and existing FIL implementations with |
21 | | -realistic batch sizes on your target hardware to determine which is optimal |
22 | | -for your specific model. Generally, however performance should be quite |
23 | | -comparable for both implementations. |
24 | | - |
25 | | -**NOTE:** Because this implementation is relatively recent, it is recommended |
26 | | -that for use cases where stability is paramount, the existing FIL |
27 | | -implementation be used. |
28 | | - |
29 | | -## Usage |
30 | | -With one exception, experimental FIL should be fully compatible with the |
31 | | -existing FIL API. Experimental FIL no longer allows a `threshold` to be |
32 | | -specified at the time a model is loaded for binary classifiers. Instead, the |
33 | | -threshold must be passed as a keyword argument to the `predict` method. |
34 | | - |
35 | | -Besides this, all existing FIL calls should be compatible with experimental |
36 | | -FIL. There are, however, several performance parameters which have been |
37 | | -deprecated (will now emit a warning) and a few new ones which have been added. |
38 | | - |
39 | | -The most basic usage remains the same: |
40 | | -```python |
41 | | -from cuml.experimental import ForestInference |
42 | | - |
43 | | -fm = ForestInference.load(filename=model_path, |
44 | | - output_class=True, |
45 | | - model_type='xgboost') |
46 | | - |
47 | | -X = ... load test samples as a numpy or cupy array ... |
48 | | - |
49 | | -y_out = fm.predict(X) |
50 | | -``` |
51 | | - |
52 | | -In order to optimize performance, however, we introduce a new optional |
53 | | -parameter to the `predict` method called `chunk_size`: |
54 | | - |
55 | | -```python |
56 | | -y_out = fm.predict(X, chunk_size=4) |
57 | | -``` |
58 | | - |
59 | | -The API docs cover `chunk_size` in more detail, but this parameter controls |
60 | | -how many rows within a batch are simultaneously evaluated during a single |
61 | | -iteration of FIL's inference algorithm. The optimal value for this |
62 | | -parameter depends on both the model and available hardware, and it is |
63 | | -difficult to predict _a priori_. In general, however, larger batches benefit |
64 | | -from larger `chunk_size` values, and smaller batches benefit from smaller |
65 | | -`chunk_size` values. |
66 | | - |
67 | | -For GPU execution, `chunk_size` can be any power of 2 from 1 to 32. For CPU |
68 | | -execution, `chunk_size` can be any power of 2, but there is generally no |
69 | | -benefit in testing values over 512. On both CPU and GPU, there is never |
70 | | -any benefit from a chunk size that exceeds the batch size. Tuning the |
71 | | -chunk size can substantially improve performance, so it is often worthwhile |
72 | | -to perform a search over chunk sizes with sample data when deploying a model |
73 | | -with FIL. |
74 | | - |
75 | | -### Loading Parameters |
76 | | -In addition to the `chunk_size` parameter for the `predict` and |
77 | | -`predict_proba` methods, FIL offers some parameters for optimizing |
78 | | -performance when the model is loaded. This implementation also |
79 | | -deprecates some existing parameters. |
80 | | - |
81 | | -#### Deprecated `load` Parameters |
82 | | - |
83 | | -- `threshold` (will raise a `DeprecationWarning` if used) |
84 | | -- `algo` (ignored, but a warning will be logged) |
85 | | -- `storage_type` (ignored, but a warning will be logged) |
86 | | -- `blocks_per_sm` (ignored, but a warning will be logged) |
87 | | -- `threads_per_tree` (ignored, but a warning will be logged) |
88 | | -- `n_items` (ignored, but a warning will be logged) |
89 | | -- `compute_shape_str` (ignored, but a warning will be logged) |
90 | | - |
91 | | -#### New `load` Parameters |
92 | | -- `layout`: Replaces the functionality of `algo` and specifies the in-memory |
93 | | - layout of nodes in FIL forests. One of `'depth_first'` (default) or |
94 | | - `'breadth_first'`. Except in cases where absolutely optimal |
95 | | - performance is critical, the default should be acceptable. |
96 | | -- `align_bytes`: If specified, trees will be padded such that their in-memory |
97 | | - size is a multiple of this value. Theoretically, this can improve |
98 | | - performance by guaranteeing that memory reads from trees begin on a cache |
99 | | - line boundary. Empirically, little benefit has been observed for this |
100 | | - parameter, and it may be deprecated before this version of FIL moves out of |
101 | | - experimental status. |
102 | | - |
103 | | -#### Optimizing `load` parameters |
104 | | -While these two new parameters have been provided for cases in which it is |
105 | | -necessary to eke out every possible performance gain for a model, in general |
106 | | -the performance benefit will be tiny relative to the benefit of |
107 | | -optimizing `chunk_size` for predict calls. |
108 | | - |
109 | | -## Future Development |
110 | | -Once experimental FIL has been thoroughly tested and evaluated in real-world |
111 | | -deployments, it will be moved out of experimental status and replace the |
112 | | -existing FIL implementation. Before this happens, RAPIDS developers will |
113 | | -also address the current underperformance of experimental FIL on shallow |
114 | | -trees to ensure performance parity. |
115 | | - |
116 | | -While this version of FIL remains in experimental status, feedback is very |
117 | | -much welcome. Please consider [submitting an |
118 | | -issue](https://github.com/rapidsai/cuml/issues/new/choose) if you notice |
119 | | -any performance regression when transitioning from the current FIL, have |
120 | | -thoughts on how to make the API more useful, or have features you |
121 | | -would like to see in the new version of FIL before it transitions out of |
122 | | -experimental. |
| 3 | +As of RAPIDS 25.04, experimental FIL has been promoted to stable. It is retained here temporarily to support users who have already migrated to experimental FIL, but it will be moved exclusively to `cuml.fil` in RAPIDS 25.06. |
0 commit comments