Add instructions for CodeRabbit#7725
Conversation
jcrist
left a comment
There was a problem hiding this comment.
I'm skeptical of bots for code review (the last one we tried was more noisy than helpful IMO). I'm hoping this one can be better.
I've marked a few worries I have in the doc on things I don't think were precise enough (or were incorrect) that I worry will cause the bot to overindex and incorrectly flag non-issues. That said, I have no experience actually trying to get an LLM to be helpful, so 🤷 maybe they're non-issues as is. Up to you.
jcrist
left a comment
There was a problem hiding this comment.
Still AI skeptical, but happy to try it and see. Thanks for putting this together.
|
Err, approval only for the python guidelines, I didn't review the C++ guidelines. |
|
Some comments below, but they don't need to slow down this PR. I think adding "documentation" for bots is useful. While I don't know how many cuml developers use AI tools - few say so either way, except Jim who so far has resisted the borg ;) - I think many do use them. So we should consider how to store this information not only for coderabbit but also Cursor, Claude, etc. A lot of the information is useful when doing work, not just when reviewing. I expect we need to tune it as we use it (eg might be too noisy/often wrong at the start). If people have From my experience the AI agents get a lot better if you give them information about existing patterns, do's and don'ts, summaries of the architecture, etc. To the extent that I have a The reason for the indirection is that it helps keep I think we should consider having a I've not had time to start a conversation about this within scikit-learn. So for now my |
There was a problem hiding this comment.
Thanks @csadorf! I think it will be very useful in the future as we will be able to work much faster, but the reviewing would otherwise lag behind.
It looks very good overall, my only concern is over-prompting and exhaustive listing. LLM are already quite knowledgeable and I am not sure that a very long list of everything that could go wrong when it comes to writing CUDA kernels is any better than "Ensure that CUDA kernels are written according to the best standards and flag any critical issue.".
Another way to design the prompt would be to limit it to issues that are genuinely specific to the cuML project and the RAPIDS libraries it integrates with, and to instruct the reviewer agent to flag only critical issues without being specific. We could then make the agent’s behaviour iteratively improvable in a precedent-driven way (e.g. if something is unnecessarily flagged too often we make sure to silence it).
Finally when it comes to matching Scikit-Learn behavior or integrating with other libraries (such as RAFT, RMM, cuVS, libcudacxx, thrust, and CUB), how would CodeRabbit know about APIs and expectations? This makes me think that we should consider integrating our reviewer agent with a doc indexer MCP like Context7. It looks like Scikit-Learn is already indexed. But, most libraries we use aren't yet.
| - Significant code duplication (3+ occurrences) in kernel logic | ||
| - Reinventing functionality already available in RAFT, thrust, RMM, or cuVS | ||
|
|
||
| ### Test Quality |
There was a problem hiding this comment.
| ### Test Quality | |
| ### Test Quality | |
| - C/C++ feature insufficiently tested |
We could have a C/C++ feature in cuML that require testing even though it does not compute anything (no numerical correctness check). e.g. : logging, device selection, multi-GPU orchestration ...
There was a problem hiding this comment.
I'd expect that the bot is smart enough to distinguish that.
|
|
||
| Before commenting, ask: | ||
| 1. Is this actually wrong/risky, or just different? | ||
| 2. Would this cause a real problem (crash, wrong results, leak)? |
There was a problem hiding this comment.
| 2. Would this cause a real problem (crash, wrong results, leak)? | |
| 2. Would this cause a real problem? |
There was a problem hiding this comment.
I'll leave this more specific.
| Suggested fix: | ||
| if (cudaMalloc(&d_data, size) != cudaSuccess) { | ||
| cudaFree(d_centroids); | ||
| return ERROR_CODE; | ||
| } |
There was a problem hiding this comment.
Isn't CodeRabbit already designed/trained to generate review comments? We might be over-prompting here. The suggested fix should be a diff and not part of the comment, but I am not sure how we could prompt this.
There was a problem hiding this comment.
I'm following cuopt's lead here. We can refine this later if needed.
|
|
||
| --- | ||
|
|
||
| ## Common Bug Patterns |
There was a problem hiding this comment.
It looks like this section is a repeat of earlier instructions.
There was a problem hiding this comment.
Those are supposed to be examples of actual bugs that we would expect the bot to flag. Prompting often benefits from some repetition. I'm inclined to leave this as-is.
Related to this I had a thought: maybe some of the things we ask coderabbit to look at are better handled with a classic (and deterministic) unit test? Checking the signature of methods is something that a test can easily and consistently do, for example. For scikit-learn a lot is covered by the common estimator tests we now use. Worth remembering that we can write classic nit tests as well as asking LLMs :D |
I am thinking of these types of instructions : I agree that API signatures could be checked programatically with the help of some Python metaprogramming in a test. However, how would it know what is available in cuML's dependencies to avoid code redundancy? |
I think this would be very difficult to do in a test. Might need a human expert or a LLM |
jameslamb
left a comment
There was a problem hiding this comment.
Given all the discussion in this PR, it seems to me that this configuration is likely to change frequently.
I'd expect that those changes don't require a full CI run running test jobs on GPU runners... could you add these new files to the changed-files exclusion lists?
cuml/.github/workflows/pr.yaml
Line 63 in ccd2853
That'd make updating this in the future less expensive.
Like @jcrist I'm also skeptical of using these tools in this way, but not going to get in the way. I'll give a ci-codeowners approval once those changes to skip tests jobs are committed here.
Co-authored-by: Divye Gala <divyegala@gmail.com> Co-authored-by: Victor Lafargue <viclafargue@nvidia.com> Co-authored-by: Tim Head <betatim@gmail.com>
|
@jameslamb I've addressed your request in 35a18d0 . |
I appreciate your feedback and expect that we will have to iterate a bit on this. I think we should start with following cuOpt's lead on this so that we don't have to apply all of the same learnings, but then we can start experimenting with tightening our prompting a bit and see if the quality of review feedback improves, deteriorates, or stays largely the same. |
|
/merge |
## Summary - Add `.coderabbit.yaml` configuration for CodeRabbit AI code reviews - Add `cpp/REVIEW_GUIDELINES.md` with C++/CUDA-specific review guidelines for RMM - Add `python/REVIEW_GUIDELINES.md` with Python-specific review guidelines for RMM - Update `AGENTS.md` with cross-references to review guidelines - Update `.github/CODEOWNERS` to assign CI codeowners to `.coderabbit.yaml` - Update `.github/workflows/pr.yaml` to exclude `AGENTS.md`, `REVIEW_GUIDELINES.md`, and `.coderabbit.yaml` from CI triggers The file structure separates concerns: - `AGENTS.md` - General development guide for AI coding agents (build commands, test commands, code style, project structure) - `cpp/REVIEW_GUIDELINES.md` - CodeRabbit review guidelines for C++/CUDA code - `python/REVIEW_GUIDELINES.md` - CodeRabbit review guidelines for Python code Adapted from rapidsai/cuml#7725. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - James Lamb (https://github.com/jameslamb) - Nate Rock (https://github.com/rockhowse) - Lawrence Mitchell (https://github.com/wence-) URL: #2244
Added general instructions and configuration in .coderabbit.yaml (ci-codeowner) and then code-specific instructions to cpp/agents.md and python/agents.md. In this way these can be more easily maintained by the respective codeowners. Also updated the CODEOWNERS file to reflect this. These instructions were largely lifted from cuopt and adapted for cuML. I plan on enabling the integration once these instructions are merged. Authors: - Simon Adorf (https://github.com/csadorf) Approvers: - Jim Crist-Harif (https://github.com/jcrist) - James Lamb (https://github.com/jameslamb) - Divye Gala (https://github.com/divyegala) URL: rapidsai#7725
Following on the work of rapidsai/rmm#2244 and rapidsai/cuml#7725, this enables general instructions and configuration to enable coderabbit codereviews on cuvs.
Following on the work of rapidsai/rmm#2244 and rapidsai/cuml#7725, this enables general instructions and configuration to enable coderabbit codereviews on cuvs. Closes #1767 Authors: - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) - Anupam (https://github.com/aamijar) Approvers: - Anupam (https://github.com/aamijar) - Corey J. Nolet (https://github.com/cjnolet) - Bradley Dice (https://github.com/bdice) URL: #1908
Following on the work of rapidsai/rmm#2244 and rapidsai/cuml#7725, this enables general instructions and configuration to enable coderabbit codereviews on cuvs. Closes rapidsai#1767 Authors: - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) - Anupam (https://github.com/aamijar) Approvers: - Anupam (https://github.com/aamijar) - Corey J. Nolet (https://github.com/cjnolet) - Bradley Dice (https://github.com/bdice) URL: rapidsai#1908
Added general instructions and configuration in .coderabbit.yaml (ci-codeowner) and then code-specific instructions to cpp/agents.md and python/agents.md. In this way these can be more easily maintained by the respective codeowners.
Also updated the CODEOWNERS file to reflect this.
These instructions were largely lifted from cuopt and adapted for cuML.
I plan on enabling the integration once these instructions are merged.