Skip to content

Conversation

@ArthurZucker
Copy link
Collaborator

@ArthurZucker ArthurZucker commented Oct 14, 2024

Goal:

from tokenizers import Tokenizer
from tokenizers.processors import TemplateProcessing
tokenizer = Tokenizer.from_pretrained("meta-llama/Llama-3.3-70B-Instruct")
tokenizer.post_processor 

tokenizer.post_processor[1] = TemplateProcessing(
    single="[CLS] $0 [SEP]",
    pair="[CLS] $A [SEP] $B:1 [SEP]:1",
    special_tokens=[("[CLS]", 1), ("[SEP]", 0)],
)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@McPatate McPatate force-pushed the sequential-post-processor branch from 11533c5 to 4bb595b Compare January 14, 2025 02:37
@McPatate McPatate marked this pull request as ready for review January 16, 2025 02:33
Copy link
Member

@McPatate McPatate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-approving here as it is your PR @ArthurZucker, waiting for your review before merging

Copy link
Collaborator Author

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! WOuld just add python tests! 😉

let's check set_item and also that get_item_ is mutable

@McPatate McPatate force-pushed the sequential-post-processor branch from d37229f to ff80e9f Compare January 27, 2025 23:03
@McPatate McPatate changed the title Support updating template processors 🚨 Support updating template processors Jan 28, 2025
@McPatate McPatate merged commit c45aebd into main Jan 28, 2025
30 checks passed
@McPatate McPatate deleted the sequential-post-processor branch January 28, 2025 13:58
Narsil added a commit that referenced this pull request Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants