-
Notifications
You must be signed in to change notification settings - Fork 493
Closed
Labels
new featureNew feature or requestNew feature or requestprocessingProcessing related to the modelProcessing related to the model
Description
We will implement based on this.
The idea is as follows, given parsed BNF.
- While the model is calculating the logits, prepare the logit bias on a worker thread (from a pool).
- Run normal sampling first: if the returned token is valid grammar, avoid applying the logit bias
- During normal sampling, apply the logit bias on a worker thread (from a pool).
- If the normal sampling produced a token that would be invalid, rerun with the applied logit bias.
Metadata
Metadata
Assignees
Labels
new featureNew feature or requestNew feature or requestprocessingProcessing related to the modelProcessing related to the model