Skip to content
Merged

blt wip #38579

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
140 commits
Select commit Hold shift + click to select a range
0826152
blt wip
Jun 4, 2025
6201947
cpu version
itazap Jun 5, 2025
58c4a4e
cpu friendly with full entropy model (real time patching)
itazap Jun 6, 2025
1d00859
adding config file instead of args file
Jun 6, 2025
bdb6cee
enable MPS
LysandreJik Jun 6, 2025
131f960
refactoring unused code
Jun 10, 2025
fb1d11b
single config class in config file
Jun 11, 2025
1eab6a4
inherit from PreTrainedModel
Jun 11, 2025
bc2aeb7
refactor LMTransformer --> BLTPatcher
Jun 12, 2025
907eca1
add conversion script
Jun 12, 2025
c4b1775
load from new checkpoing with form_pretrained
Jun 13, 2025
fececd1
fixed demo from_pretrained
Jun 13, 2025
f2604f3
clean up
Jun 13, 2025
12c000e
clean a few comments
Jun 16, 2025
4b2185d
cleanup folder
Jun 16, 2025
ad8c7a8
clean up dir
Jun 16, 2025
aff63d6
cleaned up modeling further
Jun 16, 2025
f552e27
rename classes
Jun 16, 2025
2b9dd64
adding transformers Attention class and RotaryEmbedding class
Jun 18, 2025
f25a99b
exchanged blt modules for transformers modules: attention, rotary_emb…
Jun 19, 2025
73f7e16
seperate out patcher config, update modeling and conversion script
Jun 19, 2025
8d4df99
rename vars to be more transformers-like
Jun 20, 2025
d938a2f
rm unused functions
Jun 20, 2025
3bcfc03
adding cross attention from transformers
Jun 20, 2025
2a7778c
pass arg
Jun 20, 2025
9ed04fd
rename weights
Jun 20, 2025
e6c7b68
updated conversion script
Jun 20, 2025
e3fdebb
overwritten commit! fixing PR
Jun 23, 2025
438e2e2
apply feedback
Jun 23, 2025
8ecda84
adding BLTRMSNorm like Llama
Jun 23, 2025
ceb3d8e
add repeat_kv and eager_attention_forward copied from
Jun 23, 2025
2102d32
BLTMLP identical to MllamTextMLP
Jun 23, 2025
50d2503
clean up some args'
Jun 23, 2025
66bcddb
more like mllama, but busier inits
Jun 23, 2025
5bcc11d
BLTTransformerLayer config
Jun 23, 2025
aa03d78
decoder, encoder, global configs
Jun 23, 2025
494b488
wip working on modular file
Jun 23, 2025
477406e
cleaning up patch and configs
Jun 25, 2025
13a79a5
clean up patcher helpers
Jun 25, 2025
f686d0b
clean up patcher helpers further
Jun 25, 2025
cfde897
clean up
Jun 26, 2025
09e574b
some config renaming
Jun 26, 2025
f649ff3
clean up unused configs
Jun 26, 2025
c1a5508
clean up configs
Jun 26, 2025
81e1f78
clean up configs
Jun 26, 2025
60f57bb
update modular
Jun 26, 2025
c2a9099
clean
Jun 30, 2025
6c0b8d2
update demo
Jun 30, 2025
9c71398
config more like mllama, seperated subconfigs from subdicts
Jun 30, 2025
3170003
read from config instead of self args
Jul 1, 2025
17720dc
update demo file
Jul 1, 2025
8ca6c73
model weights to causal lm weights
Jul 2, 2025
a260bb1
missed file
Jul 2, 2025
0b9db70
added tied weights keys
Jul 2, 2025
107e26d
BLTForCausalLM
Jul 3, 2025
d9d6d73
adding files after add-new-model-like
May 16, 2025
3e8dc1e
update demo
Jul 3, 2025
e6bc639
working on tests
Jul 3, 2025
7c352ae
first running integration tests
Jul 4, 2025
c32f692
added integration tests
Jul 7, 2025
e723830
adding tokenization tests, integration tests, and cleaned up tokeniza…
Jul 8, 2025
3a22e05
tokenizer clean up
Jul 8, 2025
d7b5721
modular file
Jul 8, 2025
359ecf1
fixing rebase
Jul 8, 2025
4e52837
ruff
Jul 8, 2025
fed9958
adding correct basemodel output and updating config with checkpoint v…
Jul 9, 2025
3ab48a6
BLTModelTests git status
Jul 10, 2025
6f93199
enabling inputs_embeds, although won't be equal to input_ids since ne…
Jul 10, 2025
07816a2
fix sdpa == causal tests
Jul 15, 2025
db4cc78
fix small model test and some gradient checkpointing
Jul 15, 2025
c20e484
skip training GC tests
Jul 15, 2025
d2dab12
fix test
Jul 15, 2025
2dce4bf
updated modular
Jul 15, 2025
e64ed90
update modular
Jul 15, 2025
7738005
ruff
Jul 15, 2025
8b2a238
adding modular + modeling
Jul 15, 2025
0c23353
modular
Jul 16, 2025
141d788
more modern is_casual check
Jul 16, 2025
fd1dd4a
cleaning up modular
Jul 16, 2025
82bff4e
more modular reduction
Jul 16, 2025
6f17474
ruff
Jul 16, 2025
c191396
modular fix
Jul 16, 2025
50c0353
fix styling
Jul 16, 2025
36a9553
return 2
Jul 17, 2025
132830e
return 2
Jul 17, 2025
9f3a3b4
fix some tests
Jul 17, 2025
8095303
fix bltcrossattention after modular break
Jul 18, 2025
562c03a
some fixes / feedback
Jul 21, 2025
fc1e7bf
try cache generate fix
Jul 21, 2025
8add244
try cache generate fix
Jul 21, 2025
f198de7
fix generate tests
Jul 21, 2025
ab4d2ca
attn_impl workaround
Jul 22, 2025
a00ce1d
refactoring to use recent TransformersKwargs changes
Jul 22, 2025
1df0b6a
fix hidden_states shape test
Jul 22, 2025
3f7d5cd
refactor to new outputs
Jul 23, 2025
22a511a
simplify outputs a bit
Jul 24, 2025
0239f77
rm unneeded decoderlayer overwriting
Jul 28, 2025
926fb09
rename blt
Jul 28, 2025
232d245
forgot tokenizer test renamed
Jul 28, 2025
703fab7
Reorder
itazap Jul 29, 2025
ec9b4c0
Reorder
itazap Jul 29, 2025
3117a03
working on modular
itazap Jul 29, 2025
eb4cd41
updates from modular
itazap Jul 29, 2025
c9e30fd
new modular
itazap Jul 29, 2025
3b2e3e8
ruff and such
itazap Jul 30, 2025
2ded41e
update pretrainedmodel modular
itazap Jul 30, 2025
cd7d1a8
using cohere2 apply_rotary_pos_emb
Jul 31, 2025
0183538
small changes
Aug 1, 2025
cb91d0e
apply feedback r2
itazap Aug 7, 2025
f51e2f4
fix cross_attention
Aug 8, 2025
22a20f2
apply more feedback
Aug 8, 2025
39be414
update modeling fix
Aug 14, 2025
6ecc6ff
load submodules from pretrainedmodel
Aug 17, 2025
eea290d
set initializer_range to subconfigs
Aug 17, 2025
294b80d
rm cross_attnetion_states pass when not needed
Aug 17, 2025
9ec7b28
add 7b projection layer support
Aug 18, 2025
2f9ab61
check repo
itazap Aug 18, 2025
3e28082
make copies
itazap Aug 19, 2025
52fa987
lost cohere2 rotate_half
Aug 19, 2025
f25630c
ruff
Aug 19, 2025
26706e5
copies?
Aug 19, 2025
35dde6e
don't tie weights for submodules
Aug 19, 2025
f855e52
tie weights setting
Aug 20, 2025
966e2f0
check docstrings
Aug 20, 2025
5513a6a
apply feedback
Aug 22, 2025
29144c7
rebase
Aug 22, 2025
8869cc1
rebased modeling
itazap Aug 25, 2025
f3e62f0
update docs
itazap Aug 27, 2025
cab52b5
applying feedback
Aug 28, 2025
d45f260
few more fixes
Aug 29, 2025
7ccff57
fix can_record_outputs
Sep 11, 2025
90a9a2f
fast tokenizer
Sep 15, 2025
180042d
no more modulelist
Sep 15, 2025
c495819
tok auto
Sep 18, 2025
5607b5a
rm tokenizersss
Sep 18, 2025
8085a95
fix docs
Sep 18, 2025
4272552
ruff
Sep 18, 2025
05a5b49
fix after rebase
Sep 18, 2025
d983e72
fix test, configs are not subscriptable
Sep 18, 2025
17f91b9
Merge branch 'main' into blt_wip
itazap Sep 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -411,6 +411,8 @@
title: Blenderbot Small
- local: model_doc/bloom
title: BLOOM
- local: model_doc/blt
title: BLT
- local: model_doc/bort
title: BORT
- local: model_doc/byt5
Expand Down
97 changes: 97 additions & 0 deletions docs/source/en/model_doc/blt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=
">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
</div>

# Byte Lantet Transformer (BLT)

## Overview

The BLT model was proposed in [Byte Latent Transformer: Patches Scale Better Than Tokens](<https://arxiv.org/pdf/2412.09871>) by Artidoro Pagnoni, Ram Pasunuru, Pedro Rodriguez, John Nguyen, Benjamin Muller, Margaret Li1, Chunting Zhou, Lili Yu, Jason Weston, Luke Zettlemoyer, Gargi Ghosh, Mike Lewis, Ari Holtzman†, Srinivasan Iyer.
BLT is a byte-level LLM that achieves tokenization-level performance through entropy-based dynamic patching.

The abstract from the paper is the following:

*We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference
efficiency and robustness. BLT encodes bytes into dynamically sized patches, which serve as the primary units of computation. Patches are segmented based on the entropy of the next byte, allocating
more compute and model capacity where increased data complexity demands it. We present the first flop controlled scaling study of byte-level models up to 8B parameters and 4T training bytes. Our results demonstrate the feasibility of scaling models trained on raw bytes without a fixed vocabulary. Both training and inference efficiency improve due to dynamically selecting long patches when data is predictable, along with qualitative improvements on reasoning and long tail generalization. Overall, for fixed inference costs, BLT shows significantly better scaling than tokenization-based models, by simultaneously growing both patch and model size.*

## Usage Tips:

- **Dual Model Architecture**: BLT consists of two separate trained models:
- **Patcher (Entropy Model)**: A smaller transformer model that predicts byte-level entropy to determine patch boundaries and segment input.
- **Main Transformer Model**: The primary model that processes the patches through a Local Encoder, Global Transformer, and Local Decoder.

- **Dynamic Patching**: The model uses entropy-based dynamic patching where:
- High-entropy regions (complex data) get shorter patches with more computational attention
- Low-entropy regions (predictable data) get longer patches for efficiency
- This allows the model to allocate compute resources where they're most needed

- **Local Encoder**: Processes byte sequences with cross-attention to patch embeddings
- **Global Transformer**: Processes patch-level representations with full attention across patches
- **Local Decoder**: Generates output with cross-attention back to the original byte sequence

- **Byte-Level Tokenizer**: Unlike traditional tokenizers that use learned vocabularies, BLT's tokenizer simply converts text to UTF-8 bytes and maps each byte to a token ID. There is no need for a vocabulary.

The model can be loaded via:

<hfoption id="AutoModel">

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("itazap/blt-1b-hf")
model = AutoModelForCausalLM.from_pretrained(
"itazap/blt-1b-hf",
device_map="auto",
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

prompt = "my name is"
generated_ids = model.generate(
**inputs, max_new_tokens=NUM_TOKENS_TO_GENERATE, do_sample=False, use_cache=False
)

print(tokenizer.decode(generated_ids[0]))
```

</hfoption>

This model was contributed by [itazap](https://huggingface.co/<itazap>).
The original code can be found [here](<https://github.com/facebookresearch/blt>).


## BltConfig

[[autodoc]] BltConfig

[[autodoc]] BltModel
- forward

## BltForCausalLM

[[autodoc]] BltForCausalLM
- forward
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
from .blip import *
from .blip_2 import *
from .bloom import *
from .blt import *
from .bridgetower import *
from .bros import *
from .byt5 import *
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@
("blip-2", "Blip2Config"),
("blip_2_qformer", "Blip2QFormerConfig"),
("bloom", "BloomConfig"),
("blt", "BltConfig"),
("bridgetower", "BridgeTowerConfig"),
("bros", "BrosConfig"),
("camembert", "CamembertConfig"),
Expand Down Expand Up @@ -490,6 +491,7 @@
("blip-2", "BLIP-2"),
("blip_2_qformer", "BLIP-2 QFormer"),
("bloom", "BLOOM"),
("blt", "Blt"),
("bort", "BORT"),
("bridgetower", "BridgeTower"),
("bros", "BROS"),
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ class _BaseModelWithGenerate(PreTrainedModel, GenerationMixin):
("blip-2", "Blip2Model"),
("blip_2_qformer", "Blip2QFormerModel"),
("bloom", "BloomModel"),
("blt", "BltModel"),
("bridgetower", "BridgeTowerModel"),
("bros", "BrosModel"),
("camembert", "CamembertModel"),
Expand Down Expand Up @@ -633,6 +634,7 @@ class _BaseModelWithGenerate(PreTrainedModel, GenerationMixin):
("blenderbot", "BlenderbotForCausalLM"),
("blenderbot-small", "BlenderbotSmallForCausalLM"),
("bloom", "BloomForCausalLM"),
("blt", "BltForCausalLM"),
("camembert", "CamembertForCausalLM"),
("code_llama", "LlamaForCausalLM"),
("codegen", "CodeGenForCausalLM"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/tokenization_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@
("blip", ("BertTokenizer", "BertTokenizerFast" if is_tokenizers_available() else None)),
("blip-2", ("GPT2Tokenizer", "GPT2TokenizerFast" if is_tokenizers_available() else None)),
("bloom", (None, "BloomTokenizerFast" if is_tokenizers_available() else None)),
("blt", (None, "PreTrainedTokenizerFast" if is_tokenizers_available() else None)),
("bridgetower", ("RobertaTokenizer", "RobertaTokenizerFast" if is_tokenizers_available() else None)),
("bros", ("BertTokenizer", "BertTokenizerFast" if is_tokenizers_available() else None)),
("byt5", ("ByT5Tokenizer", None)),
Expand Down
28 changes: 28 additions & 0 deletions src/transformers/models/blt/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import _LazyModule
from ...utils.import_utils import define_import_structure


if TYPE_CHECKING:
from .configuration_blt import *
from .modeling_blt import *
from .tokenization_blt import *
else:
import sys

_file = globals()["__file__"]
sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__)
Loading