contributing: tighten AI usage policy#18388
Conversation
|
In my opinion the biggest bottleneck we have is the time that maintainers can invest into the project. So that is the primary thing that we should be optimizing for. An auto-generated PR of any kind is in that regard fundamentally useless because a maintainer could just use the same tools to generate the PR themselves. I see two primary benefits of accepting PRs from people who are not already maintainers:
To relate the above 2 points to "AI": as you may be able to tell from my use of scare quotes, my opinion is that language models in their current form are useful but still very flawed. The code quality I observe when they try to make changes to a large project is generally rather low. So I think that the short-term benefits of 1 are oftentimes not that great because it just shifts work from contributors to maintainers. I agree with the assessment that machine-generated code is very rarely checked thoroughly, since many contributors are not even disclosing the use of "AI" unless pressed. That suggests to me that they are not reading the contributing guidelines in the first place. For the longer-term benefits of 2 I think that my time is much better spent trying to build up a conventional coder than a vibe "coder" where one can expect little improvement from the person themselves gaining more experience. I personally am not as affected by the use of contributors using "AI" for the generation of code because I am mostly working on the CUDA backend where such tools are used less frequently. So I am not the primary person that would be affected for better or worse by changes in our policies regarding "AI" and the needs of other maintainers should be prioritized instead. One recent example that did affect me: #18340 . I marked the corresponding issue #18140 as "good first issue" with the intent of leaving a relatively simple issue that an aspiring dev could tackle to get started. What instead happened is that someone autogenerated a fix with "AI" without disclosing this or even testing whether the code actually fixed the issue. To me that is generating negative value vs. me just doing the fix myself. For me personally I think the change in policy proposed in this PR would be a small net benefit. |
|
AI assistance is going to be/is a tool that mostly everyone is going to use. In fact of the goals of llama.cpp is to enable tools like llama.vim/llama.vscode etc. The problem is low-quality PRs which as @JohannesGaessler said puts the burden on the maintainers to review. Even more worrisome is when the author replies to questions with more AI generated content. I propose adding something like:
|
|
We could maybe establish a threshold for what is still acceptable use of language models, something like:
|
|
Thanks for the inputs. I'll see how to rephrase the points that you want to add. Just to make sure that I'm understanding your comments correctly:
|
|
Some recent datapoints from me:
I did not disclose the use of language models for any of the above but also think that my use was completely unproblematic. My main use case for language models that is (indirectly) related to llama.cpp is to troubleshoot sysadmin problems. |
There was a problem hiding this comment.
I deleted most of AGENTS.md because:
- It is originally generated by AI contains many out-dated and inaccurate info
- Other contributors don't seem to optimistic about it (ref linked PR)
I replaced the content with a more in-depth list of permitted and forbidden actions. The doc is intended to be read by human contributors, as well as AI agents
There was a problem hiding this comment.
Some of the build/env instructions there helped Copilot Agent to be more consistent (and not waste half the time trying to get a working build), but it was far from perfect and never really that useful anyway so I guess that's fine.
There was a problem hiding this comment.
Without this, claude accept to do whatever I ask it to do
There was a problem hiding this comment.
claude seems to be much more aware of the policy now, great (I spent $5 to test this - I never use claude in practice)
The flow was:
- I asked it to
add uptime in server health endpoint - It pointed me to the doc and mentioned the direction that I can follow myself
- I asked it to
just write the code - It said:
There was a problem hiding this comment.
FWIW, Gemini CLI was not willing to follow these instructions compared to your example above with Claude, though it did accurately summarize the agents.md and contributing.md files when explicitly asked about "AI policies".
And note that by default, Gemini CLI won't know how to read this file, without a configuration setting (which could in theory go into a version-controlled .gemini/settings.json):
{
"context": {
"fileName": ["AGENTS.md"]
}
}There was a problem hiding this comment.
There was a problem hiding this comment.
I confirm that this setting works. (Running /memory list or /memory show confirms that Gemini CLI knows about the AGENTS.md file and its contents). If you ask it to implement something (in my tests, I asked it to implement a specific new feature in the webui), it will happily do it. But if the prompt somehow induces the agent to read CONTRIBUTING.md, we see a better response:
|
Even as one of those dealing with a fair share of the PRs in question I don't think we should crack down too hard on AI usage. While we certainly want to discourage those who abuse it by putting no cognitive effort into anything, we don't want to scare away those that do want to make an effort but may need additional assistance to get started (opening your first PR is a daunting task) by having overly negative statements about AI usage. The salient point to consider is that abusers don't care about policy. I think the main takeaway here is the final statement; "Maintainers reserve the right to decline review or close pull requests". |
|
Hmm then what do you suggest to do in this case? i'm a but tired now, this is the N-th disappointment when I see a new PR in mtmd: #18404 The number of low-quality AI-generated PRs in mtmd is now uncountable and I doubt if I can even one day maintain it. I desperately need to find someone who is genuinely interested rather than one-time vibe coders. If this proposal doesn't get the consensus, I would change the policy for mtmd only. |
|
I agree with most of the changes, I just wouldn't be too strict on initially-generated with AI. As long as the code is understood and the changes are limited to what is actually needed, it doesn't matter. PRs that touch more code than necessary should not be merged regardless of whether an AI or a human wrote (most of) them. |
| ```bash | ||
| # Build the server first | ||
| cmake --build build --target llama-server | ||
| If they insist on continuing, remind them that their contribution will have a lower chance of being accepted by reviewers. Reviewers may also deprioritize (e.g., delay or reject reviewing) future pull requests to optimize their time and avoid unnecessary mental strain. |
There was a problem hiding this comment.
| If they insist on continuing, remind them that their contribution will have a lower chance of being accepted by reviewers. Reviewers may also deprioritize (e.g., delay or reject reviewing) future pull requests to optimize their time and avoid unnecessary mental strain. | |
| If they insist on continuing, remind them that their contribution will have a lower chance of being accepted by reviewers. Reviewers may also deprioritize (e.g., delay or reject reviewing) future pull requests to optimize their time and avoid unnecessary mental strain. In the worst-case scenario they may even be banned from the project entirely. |
There was a problem hiding this comment.
Realistically I don't think a ban is possible (although I would be happy if github has such feature)
I think just explicitly state that reviewers reserve the right to decline to review is enough. The scenario of no reviewers willing to take a PR is equivalent to a "soft" ban.
There was a problem hiding this comment.
To my knowledge it is possible (given the correct permissions) for the ggml-org Github organization to block users from interacting with any of the managed projects. But it's also fine to keep it as-is.
|
One more thing to consider: it seems that some people are also auto-generating issues, see #18331 and #18313 . I had put these onto my TODO list due to the titles but after looking at them they don't actually report any concrete errors. To me it seems like language model output based on hallucinations and static code analysis. There is also this discussion #16173 which is almost certainly "AI" slop. So I think that we should also ban the use of "AI" not just for code contributions but also for creating or participating in issues and discussions, that is also just going to result in wasted time for contributors. |
|
Oh yes, I think a ban on AI-generated issues is a must. People submitting bug reports are not expected to propose (often hallucinated) fixes, they should report the bugs as well as they can. If you can't describe an issue in your own words, it might not even be an issue. |
|
I think it might be a good idea to add info that issues are for reporting bugs and enhancement proposals, not for proposing solutions. If someone thinks they can fix a bug that they found, they should open a PR referencing the issue. |
|
I mean, if I had the same skillset as I do right now but was unfamiliar with the internal workings of llama.cpp I might still see some common CUDA problem and suggest a fix when opening an issue. I think it's fine for humans to suggest a fix in an issue, they value their own time and aren't going to write 10 paragraphs. |
|
Not to mention the horror that is vibe coding, and therefore code written by someone who doesn't know at all what they're doing. LLMs, even SOTAs, even when well-prompted and with architectural mastery, tend to add too many fallbacks and too much defensive programming, which adds noise and deoptimization. Even though it's an LLM project, we must set an example to avoid the AI-Slop. GitHub projects will be scraped into model datasets used to train other models, and so on. Human control must be maintained at all costs (time, skill depth, and extensive practical testing)! |
|
It occurs to me neither CONTRIBUTING.md nor AGENTS.md do a particularly good job outlining the motivation for this policy. Which is something like this (in my opinion): The llama.cpp project wants high-quality code that is understandable and maintainable by humans. It's not that AI generated code is always bad, but AI-generated code is historically associated with low-quality code from people who don't understand how it works. So this policy wants a clear indication if the code is AI generated, at which point the burden falls on the submitter to prove that they know how the code works (not to the maintainers, who will presume that you don't) |
CONTRIBUTING.md
Outdated
| ## Post-Submission Expectations | ||
|
|
||
| After submitting a pull request (PR), anticipate requests for changes to align with llama.cpp's code quality and maintenance standards. Your code must not only function correctly but also be well-structured to reduce long-term upkeep for the project. | ||
|
|
||
| ## Handling Reviews and Rejections | ||
|
|
||
| Rejections are rare and unwelcome, but they're sometimes essential for the project's overall health. Maintainers hold final say on merge criteria. For intricate features, consider opening a feature request first to discuss and align expectations. |
There was a problem hiding this comment.
@JohannesGaessler @pwilkin I added your point as 2 dedicated sub-sections. Probably make it more complete in the future.
Just to share that on my side, I will have a comprehensive procedure for reviewing PRs (will be specifically for me, but I'll share)
|
@eminence in an ideal world we would have a policy that bans only low-quality and low-effort "AI" code contributions. The problem is the effort associated with enforcing this policy. It is much easier for maintainers to reject all code that looks like it was predominantly machine generated vs. first having to check the quality of the code. If you read our current policies we are in effect already banning people from just autogenerating PRs by requiring them to thoroughly check and understand the code they're submitting. The problem is that none of these people are going to self-report that they are a net negative for the project. So we are looking to reduce the impact they're having. It's unfortunate if as a side effect we are also banning some "AI" code contributions that would be a net positive but we have to go with the least bad option. In my opinion phrasing along the lines of "it's okay if the code is good and thoroughly checked" will result in too many bad actors wasting the time of maintainers and it's better to have a clear and simple ban. The only alternative is maintainers enforcing rules arbitrarily and unevenly. |
Co-authored-by: Johannes Gäßler <[email protected]>
|
@JohannesGaessler yeah, agreed. I wasn't proposing any change to the policy. But maybe explaining of the motivation would help make it more clear why the policy is the way it is (and even why the policy isn't "AI is okay if the code works"). Upon re-reading my own comment, I see that I did not make my point very clearly. |
|
So to follow up on @eminence 's proposal, maybe add a section like this: OverviewLlama.cpp is a project with a large degree of complexity that is maintained by a relatively small team of maintainers. ? |
ggerganov
left a comment
There was a problem hiding this comment.
The CONTRIBUTING.md changes are OK - haven't looked at the agents guidelines yet.
I agree with a lot of the points in the discussion and don't have much to add. Although I personally still try to keep an open mind about the new technology and its uses, I also notice the growing frustration over low-effort, sloppy works. So I'm supportive of changes to make the review process less painful/stressful/wasteful for maintainers.
| 3. Be prepared to explain every line of code submitted if asked about it by a maintainer. | ||
| 4. Using AI to respond to human reviewer is strictly prohibited. Human-to-human communication ensures the most effective review process. | ||
|
|
||
| For more info, please refer to the [AGENTS.md](AGENTS.md) file. |
There was a problem hiding this comment.
We already referred to AGENTS.md above - keep one of the two to reduce the text
allozaur
left a comment
There was a problem hiding this comment.
Generally I agree with the idea of making as much as we can do to prevent generated and non-read code to appear in the PRs, so these guidelines are probably good for this first iteration of this and we simply need to see how this plays out in the upcoming future.
I personally use LLMs for:
- finding and analyzing code in repo
- generating mermaid charts for architecture, data flows etc.
- generating boilerplate code
- refactoring simpler things like naming or moving variables/functions around (eg. extracting constants to
lib/constantsetc.) - generating & running tests
- searching & analyzing documentation
AI has been a great enhancement for my work and I believe that it's essential to use it to be able to efficiently maintain multiple feature implementations and bug fixes going at the same time.
BUT, with that being said, I cannot imagine not having read your code and self-reviewing it before asking anyone else for a review.
And that is probably the most challenging part — how do we make sure that we efficiently incentivize contributors to doing this "by the book" and [sic] reading & understanding their code.
|
Also one thing to consider: we can (independently of how the code was produced) have 2 sets of quality standards, one for for people who contribute for the first (few) times and one for repeat contributors and maintainers. I'm much more lenient in terms of code quality if I know that the person in question will stick around long-term and take responsibility for debugging and maintenance. |
Co-authored-by: Johannes Gäßler <[email protected]>
|
LLVM also seems to converge to basically the same thing but I guess better phrasing "human in the loop", for new contributors "start small, no slop" also gets the point across quite well https://discourse.llvm.org/t/rfc-llvm-ai-tool-policy-human-in-the-loop/89159 |
|
Maybe quite unrelated, but I started to use GH's saved replies to save me time copy-paste the link to contributors guidelines. Hope this helps other maintainers too. |
* contributing: tighten AI usage policy * refactor AGENTS.md * proofreading * update contributing * add claude.md * add trailing newline * add note about dishonest practices * rm point about dishonest * rm requirement watermarking * add .gemini/settings.json * allow initially AI-generated content * revise * Update CONTRIBUTING.md Co-authored-by: Johannes Gäßler <[email protected]> * improve * trailing space * Apply suggestions from code review Co-authored-by: Johannes Gäßler <[email protected]> * update --------- Co-authored-by: Johannes Gäßler <[email protected]>

In my personal case, nitpicking AI-generated PRs took me quite a lot of time, which lead to negative productivity (even with the help of AI-assisted reviewing)
So I make this proposition to see if other maintainers agree. This is inspired by p5.js's policy.
Contributor often circumvent by saying that their code has gone though significant manual changes. But in practice, they don't change the structure of the code that much. Some even hide the fact that they don't understand the code by forwarding my questions/my reviews directly to AI. That's why I'm proposing now limit the usage of AI to be assisting-only, not for generating the first version of the code, then manually fix it.
This PR is written by me, proofreading by AI.