Skip to content

Dry sample#1187

Closed
81549361 wants to merge 7 commits intosgl-project:mainfrom
81549361:dry-sample
Closed

Dry sample#1187
81549361 wants to merge 7 commits intosgl-project:mainfrom
81549361:dry-sample

Conversation

@81549361
Copy link
Copy Markdown

Motivation

Looping is an undesirable behavior where the model repeats phrases verbatim that have previously occurred in the input. It affects most models, and is exacerbated by the use of truncation samplers. Chat formats are particularly susceptible due to their regular structure, which models appear to interpret as an invitation to repeat previous messages in whole or in part. Prompting the model to avoid looping has little or no effect.

The traditional weapon to combat looping are the three flavors of repetition penalty that are built into most loaders (multiplicative, additive, and frequency penalty). But those samplers are rather blunt instruments that distort the grammar of standard language, which the model has been painstakingly trained to reproduce. I have previously attempted to fix this problem ggml-org/llama.cpp#5561 that protects the basic structure of language from being penalized, but that's a hacky solution that fails to do the right thing in many cases, and even in their raw form, classical repetition penalties don't actually prevent looping reliably.

In the past weeks, I have rethought the looping problem from the ground up, and in this PR present the DRY repetition penalty, a mechanism that is able to detect textual looping and steer against it. It is far superior to the existing samplers at preventing verbatim repetition, while having essentially none of their negative effects on language structure. The result is less repetitive and higher quality output.

I have tested this sampler for about 20 hours in chat scenarios so far, and they have without question been the highest-quality chats I have ever experienced. Looping in the traditional sense simply does not happen with DRY, and the positive effects from being able to drop the standard repetition penalty are very noticeable.

Modifications

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@merrymercy
Copy link
Copy Markdown
Contributor

@81549361 Hi, thanks for the contribution. However, "Dry sample" is not currently a mainstream method. To keep our current code minimal, we may be unable to accept this PR.

One thing I can think of is to expose a custom logit processor interface, then other users can easily register their external custom logit processor without changing the sglang code.

@merrymercy merrymercy closed this Aug 29, 2024
@zhyncs zhyncs mentioned this pull request Sep 8, 2024
2 tasks
@supa-thibaud
Copy link
Copy Markdown

@merrymercy yes it could be helpful, at least, to have a way to resigester our own logit processor.

@p-e-w
Copy link
Copy Markdown

p-e-w commented Sep 20, 2024

@merrymercy

However, "Dry sample" is not currently a mainstream method.

Not sure what your definition of mainstream is, but DRY is already supported by:

  1. text-generation-webui
  2. llama.cpp (PR open; maintainer has signaled intention to merge)
  3. mistral.rs
  4. ExLlamaV2
  5. KoboldCpp
  6. SillyTavern

... and probably other projects that I'm not aware of. DRY is also commonly recommended by model authors on their HF model cards.

See the original DRY pull request for more information.

@merrymercy
Copy link
Copy Markdown
Contributor

Hi @p-e-w, @81549361, @supa-thibaud

Custom logit processor is supported by #2396. We can implement try sample with the custom logit processor.

We can still accept new sampling methods. We can merge some built-in custom logit processor into the sglang source code. Contributions are welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants