Skip to content

Add instructions for CodeRabbit#7725

Merged
rapids-bot[bot] merged 11 commits intorapidsai:mainfrom
csadorf:add-code-rabbit-instructions
Feb 3, 2026
Merged

Add instructions for CodeRabbit#7725
rapids-bot[bot] merged 11 commits intorapidsai:mainfrom
csadorf:add-code-rabbit-instructions

Conversation

@csadorf
Copy link
Copy Markdown
Contributor

@csadorf csadorf commented Jan 27, 2026

Added general instructions and configuration in .coderabbit.yaml (ci-codeowner) and then code-specific instructions to cpp/agents.md and python/agents.md. In this way these can be more easily maintained by the respective codeowners.

Also updated the CODEOWNERS file to reflect this.

These instructions were largely lifted from cuopt and adapted for cuML.

I plan on enabling the integration once these instructions are merged.

@csadorf csadorf requested review from a team as code owners January 27, 2026 18:11
@csadorf csadorf added the improvement Improvement / enhancement to an existing function label Jan 27, 2026
@github-actions github-actions Bot added the Cython / Python Cython or Python issue label Jan 27, 2026
@csadorf csadorf added the non-breaking Non-breaking change label Jan 27, 2026
Copy link
Copy Markdown
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical of bots for code review (the last one we tried was more noisy than helpful IMO). I'm hoping this one can be better.

I've marked a few worries I have in the doc on things I don't think were precise enough (or were incorrect) that I worry will cause the bot to overindex and incorrectly flag non-issues. That said, I have no experience actually trying to get an LLM to be helpful, so 🤷 maybe they're non-issues as is. Up to you.

Comment thread python/agents.md Outdated
Comment thread python/agents.md Outdated
Comment thread python/agents.md Outdated
Comment thread python/agents.md Outdated
Comment thread python/agents.md
Copy link
Copy Markdown
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still AI skeptical, but happy to try it and see. Thanks for putting this together.

@jcrist
Copy link
Copy Markdown
Member

jcrist commented Jan 27, 2026

Err, approval only for the python guidelines, I didn't review the C++ guidelines.

Comment thread cpp/agents.md Outdated
Comment thread cpp/agents.md Outdated
Comment thread cpp/agents.md Outdated
Comment thread cpp/agents.md Outdated
Comment thread cpp/agents.md Outdated
Comment thread python/agents.md Outdated
@betatim
Copy link
Copy Markdown
Member

betatim commented Jan 28, 2026

Some comments below, but they don't need to slow down this PR.

I think adding "documentation" for bots is useful. While I don't know how many cuml developers use AI tools - few say so either way, except Jim who so far has resisted the borg ;) - I think many do use them. So we should consider how to store this information not only for coderabbit but also Cursor, Claude, etc. A lot of the information is useful when doing work, not just when reviewing. I expect we need to tune it as we use it (eg might be too noisy/often wrong at the start).

If people have AGENTS.md files/setups already this might be the time to start sharing them (mine is very basic for cuml, just telling it to be brief and avoid flattery).


From my experience the AI agents get a lot better if you give them information about existing patterns, do's and don'ts, summaries of the architecture, etc. To the extent that I have a agents/ folder for scikit-learn (not made one for cuml yet) that contains documents on testing, array API'fication, common mistakes, etc. The AGENTS.md contains instructions to read the relevant files from that directory and to keep them up to date when doing work.

The reason for the indirection is that it helps keep AGENTS.md somewhat short and with informative filenames I think (but no idea how to prove it) that AI tools only add what they need to their context. It also makes it easier to review the contents of these files. This is important, because if a anti pattern gets in here it will spread across all work done by an AI tool. For scikit-learn array API work AI was quite good at creating these files, but I had to carefully review them and fix a handful of things that AI had imagined out of thin air/got totally wrong.

I think we should consider having a agents/ (or what ever you want to call it) folder that contains such information and instructions in AGENTS.md that refer to it. I don't think we need to slow down this PR to do that, but I'd like to put the thought in people's heads.

I've not had time to start a conversation about this within scikit-learn. So for now my agents/ for that project is not public (mostly because I don't know how to make it public without risking it ending up in the main repo by accident).

Copy link
Copy Markdown
Contributor

@viclafargue viclafargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @csadorf! I think it will be very useful in the future as we will be able to work much faster, but the reviewing would otherwise lag behind.

It looks very good overall, my only concern is over-prompting and exhaustive listing. LLM are already quite knowledgeable and I am not sure that a very long list of everything that could go wrong when it comes to writing CUDA kernels is any better than "Ensure that CUDA kernels are written according to the best standards and flag any critical issue.".

Another way to design the prompt would be to limit it to issues that are genuinely specific to the cuML project and the RAPIDS libraries it integrates with, and to instruct the reviewer agent to flag only critical issues without being specific. We could then make the agent’s behaviour iteratively improvable in a precedent-driven way (e.g. if something is unnecessarily flagged too often we make sure to silence it).

Finally when it comes to matching Scikit-Learn behavior or integrating with other libraries (such as RAFT, RMM, cuVS, libcudacxx, thrust, and CUB), how would CodeRabbit know about APIs and expectations? This makes me think that we should consider integrating our reviewer agent with a doc indexer MCP like Context7. It looks like Scikit-Learn is already indexed. But, most libraries we use aren't yet.

Comment thread cpp/agents.md Outdated
Comment thread cpp/agents.md Outdated
Comment thread cpp/agents.md
- Significant code duplication (3+ occurrences) in kernel logic
- Reinventing functionality already available in RAFT, thrust, RMM, or cuVS

### Test Quality
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Test Quality
### Test Quality
- C/C++ feature insufficiently tested

We could have a C/C++ feature in cuML that require testing even though it does not compute anything (no numerical correctness check). e.g. : logging, device selection, multi-GPU orchestration ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect that the bot is smart enough to distinguish that.

Comment thread cpp/agents.md

Before commenting, ask:
1. Is this actually wrong/risky, or just different?
2. Would this cause a real problem (crash, wrong results, leak)?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Would this cause a real problem (crash, wrong results, leak)?
2. Would this cause a real problem?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this more specific.

Comment thread cpp/agents.md
Comment on lines +128 to +132
Suggested fix:
if (cudaMalloc(&d_data, size) != cudaSuccess) {
cudaFree(d_centroids);
return ERROR_CODE;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't CodeRabbit already designed/trained to generate review comments? We might be over-prompting here. The suggested fix should be a diff and not part of the comment, but I am not sure how we could prompt this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following cuopt's lead here. We can refine this later if needed.

Comment thread cpp/agents.md

---

## Common Bug Patterns
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this section is a repeat of earlier instructions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are supposed to be examples of actual bugs that we would expect the bot to flag. Prompting often benefits from some repetition. I'm inclined to leave this as-is.

Comment thread cpp/agents.md Outdated
Comment thread cpp/agents.md Outdated
Comment thread cpp/agents.md
Comment thread cpp/agents.md
@betatim
Copy link
Copy Markdown
Member

betatim commented Jan 28, 2026

Finally when it comes to matching Scikit-Learn behavior or integrating with other libraries (such as RAFT, RMM, cuVS, libcudacxx, thrust, and CUB), how would CodeRabbit know about APIs and expectations? This makes me think that we should consider integrating our reviewer agent with a doc indexer MCP like Context7. It looks like Scikit-Learn is already indexed. But, most libraries we use aren't yet.

Related to this I had a thought: maybe some of the things we ask coderabbit to look at are better handled with a classic (and deterministic) unit test? Checking the signature of methods is something that a test can easily and consistently do, for example. For scikit-learn a lot is covered by the common estimator tests we now use. Worth remembering that we can write classic nit tests as well as asking LLMs :D

@viclafargue
Copy link
Copy Markdown
Contributor

viclafargue commented Jan 28, 2026

Finally when it comes to matching Scikit-Learn behavior or integrating with other libraries (such as RAFT, RMM, cuVS, libcudacxx, thrust, and CUB), how would CodeRabbit know about APIs and expectations? This makes me think that we should consider integrating our reviewer agent with a doc indexer MCP like Context7. It looks like Scikit-Learn is already indexed. But, most libraries we use aren't yet.

Related to this I had a thought: maybe some of the things we ask coderabbit to look at are better handled with a classic (and deterministic) unit test? Checking the signature of methods is something that a test can easily and consistently do, for example. For scikit-learn a lot is covered by the common estimator tests we now use. Worth remembering that we can write classic nit tests as well as asking LLMs :D

I am thinking of these types of instructions :

Scikit-learn Compatibility:
[...]
- Ensure API signatures and behavior match scikit-learn

Design & Architecture :
[...]
- Avoid reinventing functionality already available in RAFT, RMM, cuVS, libcudacxx, thrust, or CUB

I agree that API signatures could be checked programatically with the help of some Python metaprogramming in a test. However, how would it know what is available in cuML's dependencies to avoid code redundancy?

@betatim
Copy link
Copy Markdown
Member

betatim commented Jan 28, 2026

However, how would it know what is available in cuML's dependencies to avoid code redundancy?

I think this would be very difficult to do in a test. Might need a human expert or a LLM

Copy link
Copy Markdown
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given all the discussion in this PR, it seems to me that this configuration is likely to change frequently.

I'd expect that those changes don't require a full CI run running test jobs on GPU runners... could you add these new files to the changed-files exclusion lists?

files_yaml: |

That'd make updating this in the future less expensive.


Like @jcrist I'm also skeptical of using these tools in this way, but not going to get in the way. I'll give a ci-codeowners approval once those changes to skip tests jobs are committed here.

csadorf and others added 4 commits January 28, 2026 12:40
Co-authored-by: Divye Gala <divyegala@gmail.com>
Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>
Co-authored-by: Tim Head <betatim@gmail.com>
@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Jan 28, 2026

@jameslamb I've addressed your request in 35a18d0 .

@csadorf csadorf requested a review from jameslamb January 28, 2026 19:05
@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Jan 28, 2026

It looks very good overall, my only concern is over-prompting and exhaustive listing. LLM are already quite knowledgeable and I am not sure that a very long list of everything that could go wrong when it comes to writing CUDA kernels is any better than "Ensure that CUDA kernels are written according to the best standards and flag any critical issue.".

I appreciate your feedback and expect that we will have to iterate a bit on this. I think we should start with following cuOpt's lead on this so that we don't have to apply all of the same learnings, but then we can start experimenting with tightening our prompting a bit and see if the quality of review feedback improves, deteriorates, or stays largely the same.

@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Feb 3, 2026

/merge

@rapids-bot rapids-bot Bot merged commit 349d167 into rapidsai:main Feb 3, 2026
109 of 111 checks passed
@csadorf csadorf deleted the add-code-rabbit-instructions branch February 3, 2026 16:42
rapids-bot Bot pushed a commit to rapidsai/rmm that referenced this pull request Feb 9, 2026
## Summary

- Add `.coderabbit.yaml` configuration for CodeRabbit AI code reviews
- Add `cpp/REVIEW_GUIDELINES.md` with C++/CUDA-specific review guidelines for RMM
- Add `python/REVIEW_GUIDELINES.md` with Python-specific review guidelines for RMM
- Update `AGENTS.md` with cross-references to review guidelines
- Update `.github/CODEOWNERS` to assign CI codeowners to `.coderabbit.yaml`
- Update `.github/workflows/pr.yaml` to exclude `AGENTS.md`, `REVIEW_GUIDELINES.md`, and `.coderabbit.yaml` from CI triggers

The file structure separates concerns:
- `AGENTS.md` - General development guide for AI coding agents (build commands, test commands, code style, project structure)
- `cpp/REVIEW_GUIDELINES.md` - CodeRabbit review guidelines for C++/CUDA code
- `python/REVIEW_GUIDELINES.md` - CodeRabbit review guidelines for Python code

Adapted from rapidsai/cuml#7725.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - Nate Rock (https://github.com/rockhowse)
  - Lawrence Mitchell (https://github.com/wence-)

URL: #2244
dantegd added a commit to dantegd/cuml that referenced this pull request Feb 17, 2026
Added general instructions and configuration in .coderabbit.yaml (ci-codeowner) and then code-specific instructions to cpp/agents.md and python/agents.md. In this way these can be more easily maintained by the respective codeowners.

Also updated the CODEOWNERS file to reflect this.

These instructions were largely lifted from cuopt and adapted for cuML.

I plan on enabling the integration once these instructions are merged.

Authors:
  - Simon Adorf (https://github.com/csadorf)

Approvers:
  - Jim Crist-Harif (https://github.com/jcrist)
  - James Lamb (https://github.com/jameslamb)
  - Divye Gala (https://github.com/divyegala)

URL: rapidsai#7725
benfred added a commit to benfred/cuvs that referenced this pull request Mar 10, 2026
Following on the work of rapidsai/rmm#2244 and
rapidsai/cuml#7725, this enables general
instructions and configuration to enable coderabbit codereviews on cuvs.
rapids-bot Bot pushed a commit to rapidsai/cuvs that referenced this pull request Apr 9, 2026
Following on the work of rapidsai/rmm#2244 and rapidsai/cuml#7725, this enables general instructions and configuration to enable coderabbit codereviews on cuvs.

Closes #1767

Authors:
  - Ben Frederickson (https://github.com/benfred)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Anupam (https://github.com/aamijar)

Approvers:
  - Anupam (https://github.com/aamijar)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Bradley Dice (https://github.com/bdice)

URL: #1908
enp1s0 pushed a commit to enp1s0/cuvs that referenced this pull request Apr 10, 2026
Following on the work of rapidsai/rmm#2244 and rapidsai/cuml#7725, this enables general instructions and configuration to enable coderabbit codereviews on cuvs.

Closes rapidsai#1767

Authors:
  - Ben Frederickson (https://github.com/benfred)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Anupam (https://github.com/aamijar)

Approvers:
  - Anupam (https://github.com/aamijar)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#1908
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CUDA/C++ Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants