Fix integer overflow in FIL by hcho3 · Pull Request #7727 · rapidsai/cuml

hcho3 · 2026-01-28T09:02:58Z

Check the input size to ensure that the index type does not overflow. If the input is too large, throw an error.

wphicks · 2026-01-31T22:44:30Z

Do we have before-and-after benchmarks for regular predict calls with this change? I ask because early on I had tried using 64-bit indexes, and the performance hit was significant. If that's not the case anymore, great! On the other hand, if it's still a problem, we could probably do a more targeted change that would fix the bug without requiring 64-bit indexes in performance-critical code. The easiest thing would be to batch any inputs that exceed the cutoff for 32-bit indexes, but we might be able to do something more elegant in the predict_per_tree or apply internals.

csadorf · 2026-02-03T17:54:47Z

@coderabbitai full review

coderabbitai · 2026-02-03T17:54:59Z

✅ Actions performed

Full review triggered.

coderabbitai · 2026-02-03T17:58:19Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds runtime overflow/size guards to FIL CPU and GPU inference paths to prevent per-row output index overflow before parallel/kernel launches, tightens Treelite importer assertion messages to cast indices to int for clearer diagnostics, and updates SPDX years to 2026.

Changes

Cohort / File(s)	Summary
Inference overflow guards (GPU) `cpp/include/cuml/fil/detail/infer/gpu.cuh`	Adds chunk_size/task_count logic, computes a 64-bit `max_num_row` based on index type, outputs and grove count, bounds `row_count` with an `ASSERT` before kernel launch; uses 64-bit intermediates; adds `raft/core/error.hpp` and `<cinttypes>` includes; SPDX year bumped.
Inference overflow guards (CPU) `cpp/include/cuml/fil/detail/infer_kernel/cpu.hpp`	Introduces pre-parallel-region computation of a 64-bit `max_num_row` to prevent integer overflow when computing per-row outputs and asserts `row_count ≤ max_num_row`; adds `raft/core/error.hpp` and `<cinttypes>` includes; SPDX year bumped.
Treelite assertion message refinements `cpp/include/cuml/fil/treelite_importer.hpp`	Tightens assertion/diagnostic messages by casting indices/class IDs with `static_cast<int>(...)` for clearer printed values; SPDX year bumped.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Fix integer overflow in FIL' directly corresponds to the main objective of the PR, which addresses integer overflow issues in the FIL library.
Description check	✅ Passed	The description references issue `#7711` and clearly explains the fix: checking input size to prevent index type overflow and throwing an error for oversized inputs.
Linked Issues check	✅ Passed	The PR addresses the core requirement from issue `#7711` by implementing overflow protection logic with max_num_row bounds and assertions across GPU and CPU inference paths [`#7711`], matching the discussed Option 1 approach.
Out of Scope Changes check	✅ Passed	All changes are narrowly scoped to adding overflow protection: copyright year updates, include additions, and overflow guard logic in GPU/CPU inference paths, all directly addressing the integer overflow bug.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

csadorf · 2026-02-03T18:49:51Z

Do we have before-and-after benchmarks for regular predict calls with this change? I ask because early on I had tried using 64-bit indexes, and the performance hit was significant. If that's not the case anymore, great! On the other hand, if it's still a problem, we could probably do a more targeted change that would fix the bug without requiring 64-bit indexes in performance-critical code. The easiest thing would be to batch any inputs that exceed the cutoff for 32-bit indexes, but we might be able to do something more elegant in the predict_per_tree or apply internals.

@hcho3 Did you run a basic benchmark to test the performance regression? If the performance hit is significant we should aim to mitigate that. Depending on the severity we can do that either in a follow-up or try to mitigate the problem in a different way (for example issue a warning on large inputs).

csadorf

We should run a basic benchmark to understand the severity of the potential performance regression.

hcho3 · 2026-02-03T18:59:21Z

@csadorf I'm running the benchmark now

hcho3 · 2026-02-04T04:02:06Z

@csadorf I ran basic benchmark on my end and here is the result.

Performance impact of using uint64_t:

predict: 2.8% increase in the run time (2.7% decrease in throughput)
predict_per_tree: 12.3% increase in the run time (11.0% decrease in the throughput)

dantegd · 2026-02-04T14:50:31Z

@hcho3 that seems significant, but was wondering if you benchmarked different models and/or batch sizes just to have a complete picture of the perf impact?

hcho3 · 2026-02-04T15:27:13Z

Yes, I used your benchmark script. The reported performance is an average, and the slowdown is fairly consistent across the board

csadorf · 2026-02-04T17:22:18Z

I'd argue that the slow-down is outside the acceptable range considering the severity of the bug. Would it be possible to use int64 conditionally based on the dataset input size?

hcho3 · 2026-02-05T01:47:11Z

Yeah, a specialized implementation for large input size may be the way to go. Adding a specialized implementation will increase the binary size, so we should run benchmark to measure performance impact.

hcho3 · 2026-02-10T05:08:31Z

Update. I was able to derive an upper bound on row_count that's needed to ensure that the 32-bit index variable output_offset does not overflow.

Working backwards from the following formula:

cuml/cpp/include/cuml/fil/detail/infer_kernel/cpu.hpp

Lines 143 to 148 in f9928c4

    
           auto output_offset = 
        
             (row_index * num_outputs * num_grove + 
        
              (tree_index % default_num_outputs) * num_grove * 
        
                (infer_type == infer_kind::default_kind) + 
        
              tree_index * num_grove * (infer_type == infer_kind::per_tree) + grove_index); 
        
           output_workspace[output_offset] += tree_output;

row_count should be no more than the following:

max_uint32_val // (num_outputs * num_grove) - 2, when using ordinary predict;
max_uint32_val // (num_outputs * num_trees * num_grove) - 3, when using predict_per_tree.
max_uint32_val // (num_trees * num_grove) - 2, when using predict_leaf.

hcho3 · 2026-02-10T05:22:59Z

Option 1. Throw an error for large inputs.

Why? The upper bound on row_count is much laxer for the ordinary predict method than predict_leaf or predict_per_tree. So users would hardly every see integer overflows when using predict, and the bug only surfaced when we added predict_leaf and predict_per_tree. Given that predict_leaf and predict_per_tree are relatively niche applications (compared to predict), the added cost of carrying the extra 64-bit implementation may be excessive.

Option 2. Create a specialized implementation for large inputs.

csadorf · 2026-02-10T15:01:29Z

Option 1. Throw an error for large inputs.

Why? The upper bound on row_count is much laxer for the ordinary predict method than predict_leaf or predict_per_tree. So users would hardly every see integer overflows when using predict, and the bug only surfaced when we added predict_leaf and predict_per_tree. Given that predict_leaf and predict_per_tree are relatively niche applications (compared to predict), the added cost of carrying the extra 64-bit implementation may be excessive.

Option 2. Create a specialized implementation for large inputs.

My recommendation is to implement option 1 as a first defensive measure immediately. That's significantly better than the current overflow.

Let's capture option 2 in an issue and evaluate merits and implementation of that separately.

wphicks · 2026-02-10T20:00:17Z

This may be better discussed on the follow-up issue for Option 2, but I'm wondering if batching would be preferable to a specialized implementation. After implementing Option 1, you'll already have the machinery for detecting what the appropriate batch size should be, so I believe it would be a relatively small change from there. On the other hand, as @hcho3 pointed out, the compile times and binary size are already pretty inflated by all the FIL specializations as is. Doubling that just to get 64 bit index specializations seems like a pretty high cost if batching is an available solution.

coderabbitai

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@cpp/include/cuml/fil/detail/infer/gpu.cuh`:
- Around line 217-222: The division can still divide by zero when num_grove
(derived from forest.tree_count()/task_count/etc.) is 0; before computing
max_num_row or doing ASSERT, add a guard that computes the divisor (e.g.,
uint64_t denom = output_count * num_grove) and if denom == 0 handle the
degenerate case (either early return, set a safe max_num_row, or ASSERT with a
clear message) to prevent UB; update the logic around
num_grove/output_count/ceildiv/task_count/threads_per_block so the check covers
the scenario where forest.tree_count() == 0 and use the checked denom in the
max_num_row calculation and subsequent ASSERT.

wphicks · 2026-02-20T19:52:59Z

In general, I think coderabbit led us a little far afield with some of its suggestions here. The limits that it's trying to avoid are either not realistic, guarded against elsewhere, or else precluded by some other limit that we would hit long before we got to these.

My high level recommendation is that we should validate for limits on e.g. the number of trees or degenerate models when we import the model. Sprinkling those checks all throughout the code is a significant departure from the original design goal of doing expensive checks up front and failing fast before we get to actual inference.

The Platonic ideal here would probably be that we do our checks up front and then use custom types to ensure that the checks had already been performed in the places we care about them later in the code. E.g. if you need an unsigned integer that you know you can subtract 3 from, you can construct a uint32_above_2 type early on and pass that into performance critical code. For something like a type that you'll use for keeping track of tree counts, you can bake all such validation into a tree_count_t.

More practically speaking, it's probably sufficient to perform the necessary validation at import time with comments/docs explaining the checks. Regardless, I would recommend doing so in a separate PR.

I'll provide inline comments for a few other things in case you want to proceed with the recommendations from coderabbit as is.

wphicks

See my other comment for more general thoughts on the latest round of changes, but this review covers the code assuming you want to go ahead with coderabbit's suggestions.

My only other question is if we have any benchmarks covering this new change? I have less specific performance concerns here but always try to benchmark any FIL change that touches the actual inference code, since we used to occasionally miss regressions in legacy FIL.

hcho3 · 2026-02-21T01:34:02Z

@wphicks I created #7821 so that we can add import-time validation to ensure num_trees >= 1, num_outputs >= 1. For now, I will address the remaining comments.

Co-authored-by: William Hicks <whicks@nvidia.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/include/cuml/fil/detail/infer_kernel/cpu.hpp`:
- Around line 102-118: The overflow guard runs after allocating output_workspace
which can wrap the 32-bit product and cause std::bad_alloc before the ASSERT;
move the guard block (the computation of max_num_row and ASSERT(row_count <=
max_num_row)) to before the allocation of output_workspace and compute the
allocation size using 64-bit arithmetic (e.g., cast row_count, num_outputs,
num_grove to uint64_t) to check and then only construct output_workspace/compute
task_count once the size is validated, updating any uses of task_count to use
the safe 64-bit-checked value.
- Around line 112-113: Avoid the integer division-by-zero by checking the
denominator before computing max_num_row: compute a uint64_t denom = num_outputs
* static_cast<std::uint64_t>(num_grove) (where num_grove comes from
ceildiv(num_tree, grove_size) and ultimately depends on forest.tree_count()),
and if denom == 0 set max_num_row to 0 (or a safe sentinel) otherwise compute
max_num_row = static_cast<std::uint64_t>(std::numeric_limits<index_type>::max())
/ denom; update the assignment site of max_num_row in the infer_kernel (the line
using ceildiv/num_grove) to use this guarded logic.

---

Duplicate comments:
In `@cpp/include/cuml/fil/detail/infer/gpu.cuh`:
- Around line 220-221: The division can still divide by zero when output_count *
num_grove == 0 (e.g., infer_kind::default_kind with forest.tree_count()==0);
modify the max_num_row computation to guard against a zero divisor by checking
if output_count==0 || num_grove==0 and in that case set max_num_row to
std::numeric_limits<std::uint64_t>::max() (or another safe large value) instead
of performing the division, otherwise perform the existing static_cast division;
update the code around the max_num_row calculation (the variables max_num_row,
output_count, num_grove) to implement this conditional.

hcho3 · 2026-02-23T20:49:20Z

I ran the benchmark again. The impact of this code change has <1% difference in the throughput.

predict: 0.7% increase in the run time
predict_per_tree: 0.2% increase in the run time

csadorf

Much better! However, we should use the correct exception type.

hcho3 · 2026-03-04T22:58:59Z

/merge

Closes #45. Port of rapidsai/cuml#7727 Throw an exception when the input is large enough to create integer overflow. Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Simon Adorf (https://github.com/csadorf) URL: #63

Fix integer overflow in FIL

3205eeb

hcho3 requested a review from a team as a code owner January 28, 2026 09:02

hcho3 requested review from dantegd and lowener January 28, 2026 09:03

github-actions Bot assigned hcho3 Jan 28, 2026

github-actions Bot added the CUDA/C++ label Jan 28, 2026

lowener approved these changes Jan 28, 2026

View reviewed changes

hcho3 added 3 commits January 28, 2026 14:41

Fix assertions

f7ca4e9

Formatting fix

40f0ee5

Merge remote-tracking branch 'origin/main' into fix_fil_overflow

3e44778

hcho3 added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 29, 2026

Merge branch 'main' into fix_fil_overflow

b859c5a

csadorf requested changes Feb 3, 2026

View reviewed changes

Implement checks for large inputs

1bbe534

coderabbitai Bot reviewed Feb 19, 2026

View reviewed changes

wphicks reviewed Feb 20, 2026

View reviewed changes

Comment thread cpp/include/cuml/fil/detail/infer/gpu.cuh Outdated

Comment thread cpp/include/cuml/fil/detail/infer/gpu.cuh Outdated

Comment thread cpp/include/cuml/fil/detail/infer_kernel/cpu.hpp Outdated

hcho3 mentioned this pull request Feb 21, 2026

FIL should check Treelite model to ensure num_tree >= 1 and num_outputs >= 1 #7821

Closed

hcho3 and others added 2 commits February 20, 2026 17:41

Merge remote-tracking branch 'origin/main' into fix_fil_overflow

3de16a3

Adopt AAA style

5ecd012

Co-authored-by: William Hicks <whicks@nvidia.com>

coderabbitai Bot reviewed Feb 21, 2026

View reviewed changes

Comment thread cpp/include/cuml/fil/detail/infer_kernel/cpu.hpp Outdated

Comment thread cpp/include/cuml/fil/detail/infer_kernel/cpu.hpp

CPU: allocate after bounds checking

1062a6b

Merge branch 'main' into fix_fil_overflow

bf62aac

hcho3 requested a review from csadorf February 23, 2026 20:49

Merge branch 'main' into fix_fil_overflow

d94c96a

csadorf approved these changes Mar 3, 2026

View reviewed changes

Comment thread cpp/include/cuml/fil/detail/infer_kernel/cpu.hpp Outdated

Comment thread cpp/include/cuml/fil/detail/infer/gpu.cuh Outdated

Comment thread cpp/include/cuml/fil/treelite_importer.hpp

hcho3 added 5 commits March 3, 2026 13:48

Revert changes to treelite_importer.hpp

1dcc4af

Throw C++ exception instead of ASSERT

e553e0c

Merge branch 'main' into fix_fil_overflow

1acdc8c

Merge branch 'main' into fix_fil_overflow

04bb837

Add comment for -3

cc2b1fe

csadorf requested changes Mar 4, 2026

View reviewed changes

Comment thread cpp/include/cuml/fil/detail/infer_kernel/cpu.hpp Outdated

hcho3 added 2 commits March 4, 2026 13:59

Merge remote-tracking branch 'origin/main' into fix_fil_overflow

a66d137

Define a new exception type

d1c45f5

csadorf approved these changes Mar 4, 2026

View reviewed changes

rapids-bot Bot merged commit 1e6dfb7 into rapidsai:main Mar 4, 2026
99 checks passed

hcho3 deleted the fix_fil_overflow branch March 5, 2026 00:01

hcho3 added a commit to hcho3/nvforest that referenced this pull request Mar 5, 2026

Port rapidsai/cuml#7727

4126702

This was referenced Mar 5, 2026

Prevent integer overflow rapidsai/nvforest#63

Merged

Check Treelite model to ensure num_tree >= 1 and num_outputs >= 1 rapidsai/nvforest#65

Open

Conversation

hcho3 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wphicks commented Jan 31, 2026

Uh oh!

csadorf commented Feb 3, 2026

Uh oh!

coderabbitai Bot commented Feb 3, 2026

Uh oh!

coderabbitai Bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

csadorf commented Feb 3, 2026

Uh oh!

csadorf left a comment

Choose a reason for hiding this comment

Uh oh!

hcho3 commented Feb 3, 2026

Uh oh!

hcho3 commented Feb 4, 2026

Uh oh!

dantegd commented Feb 4, 2026

Uh oh!

hcho3 commented Feb 4, 2026

Uh oh!

csadorf commented Feb 4, 2026

Uh oh!

hcho3 commented Feb 5, 2026

Uh oh!

hcho3 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hcho3 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

csadorf commented Feb 10, 2026

Uh oh!

wphicks commented Feb 10, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

wphicks commented Feb 20, 2026

Uh oh!

wphicks left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hcho3 commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hcho3 commented Feb 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

csadorf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hcho3 commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

hcho3 commented Jan 28, 2026 •

edited

Loading

coderabbitai Bot commented Feb 3, 2026 •

edited

Loading

hcho3 commented Feb 10, 2026 •

edited

Loading

hcho3 commented Feb 10, 2026 •

edited

Loading

hcho3 commented Feb 21, 2026 •

edited

Loading