Fix for Llama4 models #329
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Command
Error
Similar issue
Closest issue to boolean value return: huggingface/transformers#35037
Proposed fix
Based on HF documentation: Llama4 AutoTokenizer should work for Llama4 Text only, in multimodel cases, we need to use Autoprocessor.
I noticed omitting
use_fast=Falseor settinguse_fast=Truein AutoTokenizer.from_pretrained() helped get past the error.One option (this PR) is to raise an error so we switch to use_fast=True, another option is to omit use_fast=false in default case.
I don't have enough context to gauge if second option is a good idea.
Please advise if this PR makes sense or we need some other enabling for Llama4 models to fix the issue.
Testing
Llama-4-Scout-17B-16E-InstructLlama-4-Maverick-17B-128E-Instruct