Skip to content

contributing: tighten AI usage policy#18388

Merged
ngxson merged 17 commits intomasterfrom
xsn/contrib_tighter_ai_policy
Dec 29, 2025
Merged

contributing: tighten AI usage policy#18388
ngxson merged 17 commits intomasterfrom
xsn/contrib_tighter_ai_policy

Conversation

@ngxson
Copy link
Contributor

@ngxson ngxson commented Dec 26, 2025

In my personal case, nitpicking AI-generated PRs took me quite a lot of time, which lead to negative productivity (even with the help of AI-assisted reviewing)

So I make this proposition to see if other maintainers agree. This is inspired by p5.js's policy.

Contributor often circumvent by saying that their code has gone though significant manual changes. But in practice, they don't change the structure of the code that much. Some even hide the fact that they don't understand the code by forwarding my questions/my reviews directly to AI. That's why I'm proposing now limit the usage of AI to be assisting-only, not for generating the first version of the code, then manually fix it.


This PR is written by me, proofreading by AI.

@JohannesGaessler
Copy link
Contributor

In my opinion the biggest bottleneck we have is the time that maintainers can invest into the project. So that is the primary thing that we should be optimizing for. An auto-generated PR of any kind is in that regard fundamentally useless because a maintainer could just use the same tools to generate the PR themselves. I see two primary benefits of accepting PRs from people who are not already maintainers:

  1. A maintainer does not have to do the work themself or they reduce the workload of another maintainer who would otherwise have to review the code. The underlying idea to me here is that code should always be checked by at least 2 people. Which of course only works if the person opening the PR is one of them.
  2. It serves as a way to onboard new long-term contributors and maintainers. I think that a major risk for llama.cpp/ggml is that for many parts of the code there is only a single person that can feasibly maintain it. If that single person is no longer able or willing to maintain the code, it could be pretty bad for the rest of the project. I personally am committed to work on llama.cpp/ggml long-term and I am not aware of any health issues that could render me unable to do so. Still, it is always possible for me to suddenly be involved in e.g. a traffic accident. So I am in that regard glad that @am17an has recently joined.

To relate the above 2 points to "AI": as you may be able to tell from my use of scare quotes, my opinion is that language models in their current form are useful but still very flawed. The code quality I observe when they try to make changes to a large project is generally rather low. So I think that the short-term benefits of 1 are oftentimes not that great because it just shifts work from contributors to maintainers. I agree with the assessment that machine-generated code is very rarely checked thoroughly, since many contributors are not even disclosing the use of "AI" unless pressed. That suggests to me that they are not reading the contributing guidelines in the first place.

For the longer-term benefits of 2 I think that my time is much better spent trying to build up a conventional coder than a vibe "coder" where one can expect little improvement from the person themselves gaining more experience.

I personally am not as affected by the use of contributors using "AI" for the generation of code because I am mostly working on the CUDA backend where such tools are used less frequently. So I am not the primary person that would be affected for better or worse by changes in our policies regarding "AI" and the needs of other maintainers should be prioritized instead. One recent example that did affect me: #18340 . I marked the corresponding issue #18140 as "good first issue" with the intent of leaving a relatively simple issue that an aspiring dev could tackle to get started. What instead happened is that someone autogenerated a fix with "AI" without disclosing this or even testing whether the code actually fixed the issue. To me that is generating negative value vs. me just doing the fix myself. For me personally I think the change in policy proposed in this PR would be a small net benefit.

@am17an
Copy link
Contributor

am17an commented Dec 26, 2025

AI assistance is going to be/is a tool that mostly everyone is going to use. In fact of the goals of llama.cpp is to enable tools like llama.vim/llama.vscode etc.

The problem is low-quality PRs which as @JohannesGaessler said puts the burden on the maintainers to review. Even more worrisome is when the author replies to questions with more AI generated content. I propose adding something like:

  • You must take accountability for each and every line of your PR. If your AI generated it, make sure to understand what it's doing.
  • Using AI to respond to human reviewer's questions on your PR is explicitly prohibited, it wastes the reviewer's time and your PR is likely not going to be merged

@JohannesGaessler
Copy link
Contributor

We could maybe establish a threshold for what is still acceptable use of language models, something like:

  • It is permissible to generate small code snippets to solve specific and self-contained problems as part of a larger PR. Ask yourself this question: "If someone were to post this problem on Stack Overflow, would they be able to get a good answer?" If yes, then it is permissible to use AI tools to solve this problem. If the problem is too complex or depends highly on the surrounding code, do it yourself.

@ngxson
Copy link
Contributor Author

ngxson commented Dec 26, 2025

Thanks for the inputs. I'll see how to rephrase the points that you want to add.

Just to make sure that I'm understanding your comments correctly:

  1. Contributors should write the code themselves without excessive help from AI, this encourages them to have real interest in the project, rather than stopping at a desire to push a (maybe one-time) contribution. I completely agree with this point as I'm actively looking for co-maintainer for mtmd, but unfortunately, most recent PRs from the community are largely AI-generated.
  2. My version of the document in the PR should be revised to define specific boundaries of what AI usage is permitted. I'll work on that.

@JohannesGaessler
Copy link
Contributor

Some recent datapoints from me:

  • In ggml-cuda: fix regex for arch list #18371 (comment) I raised a concern about the regex being too general. Because this is something that is tricky to get exactly right syntactically I thought about how I think it should be and then asked a language model to write the regex for me. It matched what I had thought would be correct and I copy-pasted it.
  • For llama_fit_params: return enum for fail vs. error #18374 I asked a language model to produce for me a code snippet to differentiate between two errors rather than just std::runtime_error. I copy-pasted the snippet and changed the exception name after asking why it contains using.
  • For our issues with Blackwell compilation I asked a language model about how to differentiate between CUDA architectures 120, 120a, and 120f. That is how I found about e.g. __CUDA_ARCH_SPECIFIC__ and I made sure to check whether these macros aren't just hallucinations.
  • I did not use a language model to write the code that was fixed in tools : use common_log_pause to fix fit-params output race #18276 . I just hacked it because I thought it doesn't matter that much either way. But after I saw the linked PR being merged I used a language model to explain to me the exact effects of flushing an output stream.

I did not disclose the use of language models for any of the above but also think that my use was completely unproblematic. My main use case for language models that is (indirectly) related to llama.cpp is to troubleshoot sysadmin problems.

Copy link
Contributor Author

@ngxson ngxson Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted most of AGENTS.md because:

  1. It is originally generated by AI contains many out-dated and inaccurate info
  2. Other contributors don't seem to optimistic about it (ref linked PR)

I replaced the content with a more in-depth list of permitted and forbidden actions. The doc is intended to be read by human contributors, as well as AI agents

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the build/env instructions there helped Copilot Agent to be more consistent (and not waste half the time trying to get a working build), but it was far from perfect and never really that useful anyway so I guess that's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, claude accept to do whatever I ask it to do

Copy link
Contributor Author

@ngxson ngxson Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claude seems to be much more aware of the policy now, great (I spent $5 to test this - I never use claude in practice)

The flow was:

  • I asked it to add uptime in server health endpoint
  • It pointed me to the doc and mentioned the direction that I can follow myself
  • I asked it to just write the code
  • It said:
image

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, Gemini CLI was not willing to follow these instructions compared to your example above with Claude, though it did accurately summarize the agents.md and contributing.md files when explicitly asked about "AI policies".

And note that by default, Gemini CLI won't know how to read this file, without a configuration setting (which could in theory go into a version-controlled .gemini/settings.json):

{
    "context": {
        "fileName": ["AGENTS.md"]
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the file based on this guide instead: https://agents.md/

image

Lmk if it actually works.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm that this setting works. (Running /memory list or /memory show confirms that Gemini CLI knows about the AGENTS.md file and its contents). If you ask it to implement something (in my tests, I asked it to implement a specific new feature in the webui), it will happily do it. But if the prompt somehow induces the agent to read CONTRIBUTING.md, we see a better response:

image

@CISC
Copy link
Member

CISC commented Dec 27, 2025

Even as one of those dealing with a fair share of the PRs in question I don't think we should crack down too hard on AI usage.

While we certainly want to discourage those who abuse it by putting no cognitive effort into anything, we don't want to scare away those that do want to make an effort but may need additional assistance to get started (opening your first PR is a daunting task) by having overly negative statements about AI usage.

The salient point to consider is that abusers don't care about policy. I think the main takeaway here is the final statement; "Maintainers reserve the right to decline review or close pull requests".

@ngxson
Copy link
Contributor Author

ngxson commented Dec 27, 2025

Hmm then what do you suggest to do in this case?

i'm a but tired now, this is the N-th disappointment when I see a new PR in mtmd: #18404

The number of low-quality AI-generated PRs in mtmd is now uncountable and I doubt if I can even one day maintain it. I desperately need to find someone who is genuinely interested rather than one-time vibe coders.

If this proposal doesn't get the consensus, I would change the policy for mtmd only.

@0cc4m
Copy link
Contributor

0cc4m commented Dec 27, 2025

I agree with most of the changes, I just wouldn't be too strict on initially-generated with AI. As long as the code is understood and the changes are limited to what is actually needed, it doesn't matter. PRs that touch more code than necessary should not be merged regardless of whether an AI or a human wrote (most of) them.

```bash
# Build the server first
cmake --build build --target llama-server
If they insist on continuing, remind them that their contribution will have a lower chance of being accepted by reviewers. Reviewers may also deprioritize (e.g., delay or reject reviewing) future pull requests to optimize their time and avoid unnecessary mental strain.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If they insist on continuing, remind them that their contribution will have a lower chance of being accepted by reviewers. Reviewers may also deprioritize (e.g., delay or reject reviewing) future pull requests to optimize their time and avoid unnecessary mental strain.
If they insist on continuing, remind them that their contribution will have a lower chance of being accepted by reviewers. Reviewers may also deprioritize (e.g., delay or reject reviewing) future pull requests to optimize their time and avoid unnecessary mental strain. In the worst-case scenario they may even be banned from the project entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realistically I don't think a ban is possible (although I would be happy if github has such feature)

I think just explicitly state that reviewers reserve the right to decline to review is enough. The scenario of no reviewers willing to take a PR is equivalent to a "soft" ban.

Copy link
Contributor

@JohannesGaessler JohannesGaessler Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my knowledge it is possible (given the correct permissions) for the ggml-org Github organization to block users from interacting with any of the managed projects. But it's also fine to keep it as-is.

@JohannesGaessler
Copy link
Contributor

One more thing to consider: it seems that some people are also auto-generating issues, see #18331 and #18313 . I had put these onto my TODO list due to the titles but after looking at them they don't actually report any concrete errors. To me it seems like language model output based on hallucinations and static code analysis. There is also this discussion #16173 which is almost certainly "AI" slop.

So I think that we should also ban the use of "AI" not just for code contributions but also for creating or participating in issues and discussions, that is also just going to result in wasted time for contributors.

@pwilkin
Copy link
Member

pwilkin commented Dec 27, 2025

Oh yes, I think a ban on AI-generated issues is a must. People submitting bug reports are not expected to propose (often hallucinated) fixes, they should report the bugs as well as they can. If you can't describe an issue in your own words, it might not even be an issue.

@pwilkin
Copy link
Member

pwilkin commented Dec 27, 2025

I think it might be a good idea to add info that issues are for reporting bugs and enhancement proposals, not for proposing solutions. If someone thinks they can fix a bug that they found, they should open a PR referencing the issue.

@JohannesGaessler
Copy link
Contributor

I mean, if I had the same skillset as I do right now but was unfamiliar with the internal workings of llama.cpp I might still see some common CUDA problem and suggest a fix when opening an issue. I think it's fine for humans to suggest a fix in an issue, they value their own time and aren't going to write 10 paragraphs.

@ServeurpersoCom
Copy link
Contributor

Not to mention the horror that is vibe coding, and therefore code written by someone who doesn't know at all what they're doing. LLMs, even SOTAs, even when well-prompted and with architectural mastery, tend to add too many fallbacks and too much defensive programming, which adds noise and deoptimization. Even though it's an LLM project, we must set an example to avoid the AI-Slop. GitHub projects will be scraped into model datasets used to train other models, and so on. Human control must be maintained at all costs (time, skill depth, and extensive practical testing)!

@eminence
Copy link

It occurs to me neither CONTRIBUTING.md nor AGENTS.md do a particularly good job outlining the motivation for this policy. Which is something like this (in my opinion): The llama.cpp project wants high-quality code that is understandable and maintainable by humans. It's not that AI generated code is always bad, but AI-generated code is historically associated with low-quality code from people who don't understand how it works. So this policy wants a clear indication if the code is AI generated, at which point the burden falls on the submitter to prove that they know how the code works (not to the maintainers, who will presume that you don't)

CONTRIBUTING.md Outdated
Comment on lines +42 to +48
## Post-Submission Expectations

After submitting a pull request (PR), anticipate requests for changes to align with llama.cpp's code quality and maintenance standards. Your code must not only function correctly but also be well-structured to reduce long-term upkeep for the project.

## Handling Reviews and Rejections

Rejections are rare and unwelcome, but they're sometimes essential for the project's overall health. Maintainers hold final say on merge criteria. For intricate features, consider opening a feature request first to discuss and align expectations.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohannesGaessler @pwilkin I added your point as 2 dedicated sub-sections. Probably make it more complete in the future.

Just to share that on my side, I will have a comprehensive procedure for reviewing PRs (will be specifically for me, but I'll share)

@JohannesGaessler
Copy link
Contributor

@eminence in an ideal world we would have a policy that bans only low-quality and low-effort "AI" code contributions. The problem is the effort associated with enforcing this policy. It is much easier for maintainers to reject all code that looks like it was predominantly machine generated vs. first having to check the quality of the code. If you read our current policies we are in effect already banning people from just autogenerating PRs by requiring them to thoroughly check and understand the code they're submitting. The problem is that none of these people are going to self-report that they are a net negative for the project. So we are looking to reduce the impact they're having. It's unfortunate if as a side effect we are also banning some "AI" code contributions that would be a net positive but we have to go with the least bad option. In my opinion phrasing along the lines of "it's okay if the code is good and thoroughly checked" will result in too many bad actors wasting the time of maintainers and it's better to have a clear and simple ban. The only alternative is maintainers enforcing rules arbitrarily and unevenly.

Co-authored-by: Johannes Gäßler <[email protected]>
@eminence
Copy link

@JohannesGaessler yeah, agreed. I wasn't proposing any change to the policy. But maybe explaining of the motivation would help make it more clear why the policy is the way it is (and even why the policy isn't "AI is okay if the code works"). Upon re-reading my own comment, I see that I did not make my point very clearly.

@pwilkin
Copy link
Member

pwilkin commented Dec 28, 2025

So to follow up on @eminence 's proposal, maybe add a section like this:

Overview

Llama.cpp is a project with a large degree of complexity that is maintained by a relatively small team of maintainers.
Because of that, code quality and maintainability are a very important factor when considering any contributions and modifications to the code. Moreover, maintainers have a limited capacity to review complex PRs that have not been explicitly discussed with them before. If you are a first-time contributor to the project, start small, do not propose extensive modifications to multiple backends and consider splitting your PR into parts if possible. Note that while we do value new features and functionality, adding them cannot come with a disadvantage to project maintainability. This means that not only your code must work, it must be understandable to the maintainers and easy enough to modify if the need arises. Submissions that cannot meet these criteria will be rejected even if they contain functional code.

?

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CONTRIBUTING.md changes are OK - haven't looked at the agents guidelines yet.

I agree with a lot of the points in the discussion and don't have much to add. Although I personally still try to keep an open mind about the new technology and its uses, I also notice the growing frustration over low-effort, sloppy works. So I'm supportive of changes to make the review process less painful/stressful/wasteful for maintainers.

3. Be prepared to explain every line of code submitted if asked about it by a maintainer.
4. Using AI to respond to human reviewer is strictly prohibited. Human-to-human communication ensures the most effective review process.

For more info, please refer to the [AGENTS.md](AGENTS.md) file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already referred to AGENTS.md above - keep one of the two to reduce the text

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed your reviewers in 3631ace

Copy link
Contributor

@allozaur allozaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I agree with the idea of making as much as we can do to prevent generated and non-read code to appear in the PRs, so these guidelines are probably good for this first iteration of this and we simply need to see how this plays out in the upcoming future.

I personally use LLMs for:

  • finding and analyzing code in repo
  • generating mermaid charts for architecture, data flows etc.
  • generating boilerplate code
  • refactoring simpler things like naming or moving variables/functions around (eg. extracting constants to lib/constants etc.)
  • generating & running tests
  • searching & analyzing documentation

AI has been a great enhancement for my work and I believe that it's essential to use it to be able to efficiently maintain multiple feature implementations and bug fixes going at the same time.

BUT, with that being said, I cannot imagine not having read your code and self-reviewing it before asking anyone else for a review.

And that is probably the most challenging part — how do we make sure that we efficiently incentivize contributors to doing this "by the book" and [sic] reading & understanding their code.

@JohannesGaessler
Copy link
Contributor

Also one thing to consider: we can (independently of how the code was produced) have 2 sets of quality standards, one for for people who contribute for the first (few) times and one for repeat contributors and maintainers. I'm much more lenient in terms of code quality if I know that the person in question will stick around long-term and take responsibility for debugging and maintenance.

ngxson and others added 2 commits December 29, 2025 12:40
@ngxson ngxson changed the title (proposal) contributing: tighten AI usage policy contributing: tighten AI usage policy Dec 29, 2025
@ngxson ngxson merged commit 3595ae5 into master Dec 29, 2025
2 checks passed
@am17an
Copy link
Contributor

am17an commented Dec 31, 2025

LLVM also seems to converge to basically the same thing but I guess better phrasing "human in the loop", for new contributors "start small, no slop" also gets the point across quite well

https://discourse.llvm.org/t/rfc-llvm-ai-tool-policy-human-in-the-loop/89159

@ngxson
Copy link
Contributor Author

ngxson commented Dec 31, 2025

Maybe quite unrelated, but I started to use GH's saved replies to save me time copy-paste the link to contributors guidelines. Hope this helps other maintainers too.

blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* contributing: tighten AI usage policy

* refactor AGENTS.md

* proofreading

* update contributing

* add claude.md

* add trailing newline

* add note about dishonest practices

* rm point about dishonest

* rm requirement watermarking

* add .gemini/settings.json

* allow initially AI-generated content

* revise

* Update CONTRIBUTING.md

Co-authored-by: Johannes Gäßler <[email protected]>

* improve

* trailing space

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <[email protected]>

* update

---------

Co-authored-by: Johannes Gäßler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants