Speed-up the DRY logits processor#6087
Speed-up the DRY logits processor#6087jojje wants to merge 21 commits intooobabooga:mainfrom jojje:feat/dry_speedup
Conversation
This helps me keep this up-to-date more easily.
This reverts commit 5499bc9.
|
After integrated profiling into text-ui, it seems this solution while being faster according to profiling data, hardly makes a dent in the overall generation latency from what I can see. If you have access to a fast model that yields more tokens/s than I have, then it's worth a shot. According to the profiling data, this version is about 2.5x faster when generating 4500 tokens using microsoft_Phi-3-mini-128k-instruct and the settings below. This corresponds to the performance benchmark. posted in the thread. Here's the profiling data for the original code (the code in the dev branch): And here are the results for this PR To summarize the data.
|
|
Your PR is currently pointed at |
|
For reference:
|
|
As pointed out by @belladoreai, this PR is quite similar to #6053, minus the match length cap to guarantee linear-time complexity for adversarial inputs. I don't have enough VRAM to run a large model at very long context length, so if you do, perhaps you can benchmark to compare the two implementations. |
Checklist:
Note, I didn't find any unit-tests in this repo, so created my own to ensure this change does not alter the behavior (output) in any way. You can find the test (both assertion testing and benchmarking in one) here.