Add instructions for CodeRabbit by csadorf · Pull Request #7725 · rapidsai/cuml

csadorf · 2026-01-27T18:11:39Z

Added general instructions and configuration in .coderabbit.yaml (ci-codeowner) and then code-specific instructions to cpp/agents.md and python/agents.md. In this way these can be more easily maintained by the respective codeowners.

Also updated the CODEOWNERS file to reflect this.

These instructions were largely lifted from cuopt and adapted for cuML.

I plan on enabling the integration once these instructions are merged.

jcrist

I'm skeptical of bots for code review (the last one we tried was more noisy than helpful IMO). I'm hoping this one can be better.

I've marked a few worries I have in the doc on things I don't think were precise enough (or were incorrect) that I worry will cause the bot to overindex and incorrectly flag non-issues. That said, I have no experience actually trying to get an LLM to be helpful, so 🤷 maybe they're non-issues as is. Up to you.

jcrist

Still AI skeptical, but happy to try it and see. Thanks for putting this together.

jcrist · 2026-01-27T18:51:53Z

Err, approval only for the python guidelines, I didn't review the C++ guidelines.

betatim · 2026-01-28T08:08:46Z

Some comments below, but they don't need to slow down this PR.

I think adding "documentation" for bots is useful. While I don't know how many cuml developers use AI tools - few say so either way, except Jim who so far has resisted the borg ;) - I think many do use them. So we should consider how to store this information not only for coderabbit but also Cursor, Claude, etc. A lot of the information is useful when doing work, not just when reviewing. I expect we need to tune it as we use it (eg might be too noisy/often wrong at the start).

If people have AGENTS.md files/setups already this might be the time to start sharing them (mine is very basic for cuml, just telling it to be brief and avoid flattery).

From my experience the AI agents get a lot better if you give them information about existing patterns, do's and don'ts, summaries of the architecture, etc. To the extent that I have a agents/ folder for scikit-learn (not made one for cuml yet) that contains documents on testing, array API'fication, common mistakes, etc. The AGENTS.md contains instructions to read the relevant files from that directory and to keep them up to date when doing work.

The reason for the indirection is that it helps keep AGENTS.md somewhat short and with informative filenames I think (but no idea how to prove it) that AI tools only add what they need to their context. It also makes it easier to review the contents of these files. This is important, because if a anti pattern gets in here it will spread across all work done by an AI tool. For scikit-learn array API work AI was quite good at creating these files, but I had to carefully review them and fix a handful of things that AI had imagined out of thin air/got totally wrong.

I think we should consider having a agents/ (or what ever you want to call it) folder that contains such information and instructions in AGENTS.md that refer to it. I don't think we need to slow down this PR to do that, but I'd like to put the thought in people's heads.

I've not had time to start a conversation about this within scikit-learn. So for now my agents/ for that project is not public (mostly because I don't know how to make it public without risking it ending up in the main repo by accident).

viclafargue

Thanks @csadorf! I think it will be very useful in the future as we will be able to work much faster, but the reviewing would otherwise lag behind.

It looks very good overall, my only concern is over-prompting and exhaustive listing. LLM are already quite knowledgeable and I am not sure that a very long list of everything that could go wrong when it comes to writing CUDA kernels is any better than "Ensure that CUDA kernels are written according to the best standards and flag any critical issue.".

Another way to design the prompt would be to limit it to issues that are genuinely specific to the cuML project and the RAPIDS libraries it integrates with, and to instruct the reviewer agent to flag only critical issues without being specific. We could then make the agent’s behaviour iteratively improvable in a precedent-driven way (e.g. if something is unnecessarily flagged too often we make sure to silence it).

Finally when it comes to matching Scikit-Learn behavior or integrating with other libraries (such as RAFT, RMM, cuVS, libcudacxx, thrust, and CUB), how would CodeRabbit know about APIs and expectations? This makes me think that we should consider integrating our reviewer agent with a doc indexer MCP like Context7. It looks like Scikit-Learn is already indexed. But, most libraries we use aren't yet.

viclafargue · 2026-01-28T09:15:21Z

+- Significant code duplication (3+ occurrences) in kernel logic
+- Reinventing functionality already available in RAFT, thrust, RMM, or cuVS
+
+### Test Quality


Suggested change

### Test Quality

### Test Quality

- C/C++ feature insufficiently tested

We could have a C/C++ feature in cuML that require testing even though it does not compute anything (no numerical correctness check). e.g. : logging, device selection, multi-GPU orchestration ...

I'd expect that the bot is smart enough to distinguish that.

viclafargue · 2026-01-28T09:23:38Z

+
+Before commenting, ask:
+1. Is this actually wrong/risky, or just different?
+2. Would this cause a real problem (crash, wrong results, leak)?


Suggested change

2. Would this cause a real problem (crash, wrong results, leak)?

2. Would this cause a real problem?

I'll leave this more specific.

viclafargue · 2026-01-28T09:28:20Z

+Suggested fix:
+if (cudaMalloc(&d_data, size) != cudaSuccess) {
+    cudaFree(d_centroids);
+    return ERROR_CODE;
+}


Isn't CodeRabbit already designed/trained to generate review comments? We might be over-prompting here. The suggested fix should be a diff and not part of the comment, but I am not sure how we could prompt this.

I'm following cuopt's lead here. We can refine this later if needed.

viclafargue · 2026-01-28T09:35:54Z

+
+---
+
+## Common Bug Patterns


It looks like this section is a repeat of earlier instructions.

Those are supposed to be examples of actual bugs that we would expect the bot to flag. Prompting often benefits from some repetition. I'm inclined to leave this as-is.

betatim · 2026-01-28T13:57:21Z

Finally when it comes to matching Scikit-Learn behavior or integrating with other libraries (such as RAFT, RMM, cuVS, libcudacxx, thrust, and CUB), how would CodeRabbit know about APIs and expectations? This makes me think that we should consider integrating our reviewer agent with a doc indexer MCP like Context7. It looks like Scikit-Learn is already indexed. But, most libraries we use aren't yet.

Related to this I had a thought: maybe some of the things we ask coderabbit to look at are better handled with a classic (and deterministic) unit test? Checking the signature of methods is something that a test can easily and consistently do, for example. For scikit-learn a lot is covered by the common estimator tests we now use. Worth remembering that we can write classic nit tests as well as asking LLMs :D

viclafargue · 2026-01-28T14:06:30Z

Finally when it comes to matching Scikit-Learn behavior or integrating with other libraries (such as RAFT, RMM, cuVS, libcudacxx, thrust, and CUB), how would CodeRabbit know about APIs and expectations? This makes me think that we should consider integrating our reviewer agent with a doc indexer MCP like Context7. It looks like Scikit-Learn is already indexed. But, most libraries we use aren't yet.

Related to this I had a thought: maybe some of the things we ask coderabbit to look at are better handled with a classic (and deterministic) unit test? Checking the signature of methods is something that a test can easily and consistently do, for example. For scikit-learn a lot is covered by the common estimator tests we now use. Worth remembering that we can write classic nit tests as well as asking LLMs :D

I am thinking of these types of instructions :

Scikit-learn Compatibility:
[...]
- Ensure API signatures and behavior match scikit-learn

Design & Architecture :
[...]
- Avoid reinventing functionality already available in RAFT, RMM, cuVS, libcudacxx, thrust, or CUB

I agree that API signatures could be checked programatically with the help of some Python metaprogramming in a test. However, how would it know what is available in cuML's dependencies to avoid code redundancy?

betatim · 2026-01-28T14:38:19Z

However, how would it know what is available in cuML's dependencies to avoid code redundancy?

I think this would be very difficult to do in a test. Might need a human expert or a LLM

jameslamb

Given all the discussion in this PR, it seems to me that this configuration is likely to change frequently.

I'd expect that those changes don't require a full CI run running test jobs on GPU runners... could you add these new files to the changed-files exclusion lists?

cuml/.github/workflows/pr.yaml

Line 63 in ccd2853

files_yaml: |

That'd make updating this in the future less expensive.

Like @jcrist I'm also skeptical of using these tools in this way, but not going to get in the way. I'll give a ci-codeowners approval once those changes to skip tests jobs are committed here.

Co-authored-by: Divye Gala <divyegala@gmail.com> Co-authored-by: Victor Lafargue <viclafargue@nvidia.com> Co-authored-by: Tim Head <betatim@gmail.com>

csadorf · 2026-01-28T19:04:47Z

@jameslamb I've addressed your request in 35a18d0 .

…uctions

csadorf · 2026-01-28T19:12:27Z

It looks very good overall, my only concern is over-prompting and exhaustive listing. LLM are already quite knowledgeable and I am not sure that a very long list of everything that could go wrong when it comes to writing CUDA kernels is any better than "Ensure that CUDA kernels are written according to the best standards and flag any critical issue.".

I appreciate your feedback and expect that we will have to iterate a bit on this. I think we should start with following cuOpt's lead on this so that we don't have to apply all of the same learnings, but then we can start experimenting with tightening our prompting a bit and see if the quality of review feedback improves, deteriorates, or stays largely the same.

…uctions

csadorf · 2026-02-03T16:42:26Z

/merge

## Summary - Add `.coderabbit.yaml` configuration for CodeRabbit AI code reviews - Add `cpp/REVIEW_GUIDELINES.md` with C++/CUDA-specific review guidelines for RMM - Add `python/REVIEW_GUIDELINES.md` with Python-specific review guidelines for RMM - Update `AGENTS.md` with cross-references to review guidelines - Update `.github/CODEOWNERS` to assign CI codeowners to `.coderabbit.yaml` - Update `.github/workflows/pr.yaml` to exclude `AGENTS.md`, `REVIEW_GUIDELINES.md`, and `.coderabbit.yaml` from CI triggers The file structure separates concerns: - `AGENTS.md` - General development guide for AI coding agents (build commands, test commands, code style, project structure) - `cpp/REVIEW_GUIDELINES.md` - CodeRabbit review guidelines for C++/CUDA code - `python/REVIEW_GUIDELINES.md` - CodeRabbit review guidelines for Python code Adapted from rapidsai/cuml#7725. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - James Lamb (https://github.com/jameslamb) - Nate Rock (https://github.com/rockhowse) - Lawrence Mitchell (https://github.com/wence-) URL: #2244

Added general instructions and configuration in .coderabbit.yaml (ci-codeowner) and then code-specific instructions to cpp/agents.md and python/agents.md. In this way these can be more easily maintained by the respective codeowners. Also updated the CODEOWNERS file to reflect this. These instructions were largely lifted from cuopt and adapted for cuML. I plan on enabling the integration once these instructions are merged. Authors: - Simon Adorf (https://github.com/csadorf) Approvers: - Jim Crist-Harif (https://github.com/jcrist) - James Lamb (https://github.com/jameslamb) - Divye Gala (https://github.com/divyegala) URL: rapidsai#7725

Following on the work of rapidsai/rmm#2244 and rapidsai/cuml#7725, this enables general instructions and configuration to enable coderabbit codereviews on cuvs.

Following on the work of rapidsai/rmm#2244 and rapidsai/cuml#7725, this enables general instructions and configuration to enable coderabbit codereviews on cuvs. Closes #1767 Authors: - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) - Anupam (https://github.com/aamijar) Approvers: - Anupam (https://github.com/aamijar) - Corey J. Nolet (https://github.com/cjnolet) - Bradley Dice (https://github.com/bdice) URL: #1908

Following on the work of rapidsai/rmm#2244 and rapidsai/cuml#7725, this enables general instructions and configuration to enable coderabbit codereviews on cuvs. Closes rapidsai#1767 Authors: - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) - Anupam (https://github.com/aamijar) Approvers: - Anupam (https://github.com/aamijar) - Corey J. Nolet (https://github.com/cjnolet) - Bradley Dice (https://github.com/bdice) URL: rapidsai#1908

Add instructions for CodeRabbit

43164c5

csadorf requested review from a team as code owners January 27, 2026 18:11

csadorf requested review from AyodeAwe, betatim, divyegala and jcrist January 27, 2026 18:11

csadorf added the improvement Improvement / enhancement to an existing function label Jan 27, 2026

github-actions Bot added the Cython / Python Cython or Python issue label Jan 27, 2026

csadorf added the non-breaking Non-breaking change label Jan 27, 2026

github-actions Bot added the CUDA/C++ label Jan 27, 2026

github-actions Bot assigned csadorf Jan 27, 2026

jcrist reviewed Jan 27, 2026

View reviewed changes

Comment thread python/agents.md Outdated

Comment thread python/agents.md Outdated

Comment thread python/agents.md Outdated

Comment thread python/agents.md Outdated

Comment thread python/agents.md

address review comments

409c5a0

jcrist approved these changes Jan 27, 2026

View reviewed changes

divyegala reviewed Jan 27, 2026

View reviewed changes

Comment thread cpp/agents.md Outdated

Comment thread cpp/agents.md Outdated

Comment thread cpp/agents.md Outdated

Comment thread cpp/agents.md Outdated

Comment thread cpp/agents.md Outdated

betatim reviewed Jan 28, 2026

View reviewed changes

Comment thread python/agents.md Outdated

viclafargue reviewed Jan 28, 2026

View reviewed changes

jameslamb requested changes Jan 28, 2026

View reviewed changes

csadorf mentioned this pull request Jan 28, 2026

Add instructions for CodeRabbit rapidsai/nvforest#36

Merged

csadorf and others added 4 commits January 28, 2026 12:40

Apply suggestions from code review

0569639

Co-authored-by: Divye Gala <divyegala@gmail.com> Co-authored-by: Victor Lafargue <viclafargue@nvidia.com> Co-authored-by: Tim Head <betatim@gmail.com>

place high-level summary in walkthrough comment

c879f1a

Remove some instructions.

4e64f69

Do not promote reset_state() pattern.

9085932

Add coderabbit config and agent instructions to change-file exclusions.

35a18d0

csadorf requested a review from jameslamb January 28, 2026 19:05

jameslamb approved these changes Jan 28, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into add-code-rabbit-instr…

2378da7

…uctions

Merge branch 'main' into add-code-rabbit-instructions

4faa3be

divyegala approved these changes Jan 30, 2026

View reviewed changes

csadorf added 2 commits February 2, 2026 15:33

Merge remote-tracking branch 'origin/main' into add-code-rabbit-instr…

c6889c8

…uctions

Merge branch 'main' into add-code-rabbit-instructions

2b07a49

rapids-bot Bot merged commit 349d167 into rapidsai:main Feb 3, 2026
109 of 111 checks passed

csadorf deleted the add-code-rabbit-instructions branch February 3, 2026 16:42

cjnolet mentioned this pull request Feb 3, 2026

[FEA] Enable code rabbit for AI code reviews rapidsai/cuvs#1767

Closed

bdice mentioned this pull request Feb 3, 2026

Add CodeRabbit configuration and AI review guidelines rapidsai/rmm#2244

Merged

benfred added a commit to benfred/cuvs that referenced this pull request Mar 10, 2026

Coderabbit integration

d2b040e

Following on the work of rapidsai/rmm#2244 and rapidsai/cuml#7725, this enables general instructions and configuration to enable coderabbit codereviews on cuvs.

benfred mentioned this pull request Mar 10, 2026

Coderabbit integration rapidsai/cuvs#1908

Merged

NaderAlAwar mentioned this pull request Apr 29, 2026

Figure out what config we want to enable CodeRabbit for review NVIDIA/cccl#8710

Closed

	### Test Quality
	### Test Quality
	- C/C++ feature insufficiently tested

	2. Would this cause a real problem (crash, wrong results, leak)?
	2. Would this cause a real problem?

Conversation

csadorf commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcrist left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jcrist left a comment

Choose a reason for hiding this comment

Uh oh!

jcrist commented Jan 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

betatim commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viclafargue left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

betatim commented Jan 28, 2026

Uh oh!

viclafargue commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

betatim commented Jan 28, 2026

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

csadorf commented Jan 28, 2026

Uh oh!

csadorf commented Jan 28, 2026

Uh oh!

csadorf commented Feb 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

csadorf commented Jan 27, 2026 •

edited

Loading

betatim commented Jan 28, 2026 •

edited

Loading

viclafargue left a comment •

edited

Loading

viclafargue commented Jan 28, 2026 •

edited

Loading