Skip to content

Bonsai support (AVX2, generic)#1570

Merged
ikawrakow merged 2 commits intomainfrom
ik/bonsai_avx2
Apr 2, 2026
Merged

Bonsai support (AVX2, generic)#1570
ikawrakow merged 2 commits intomainfrom
ik/bonsai_avx2

Conversation

@ikawrakow
Copy link
Copy Markdown
Owner

@ikawrakow ikawrakow commented Apr 2, 2026

I spotted the Bonsai models (actually, there was a post on HN and that's how I "spotted" them). These are true 1-bit models (1 bit per weight plus a 16-bit scale per 128 weights, so effectively 1.125 bpw).

Don't know to what extend they can be useful, but I'm sure at least some are curious to try them.

So, here we go. This PR adds CPU-only support (AVX2 and generic implementation).

In case you are curious about performance, here is what I get on a Ryzen-3995WX for the 4B model:

model size params backend threads test t/s
qwen3 4B Q1_0_G128 - 1.125 bpw 592.16 MiB 4.41 B CPU 64 pp512 452.99 ± 2.79
qwen3 4B Q1_0_G128 - 1.125 bpw 592.16 MiB 4.41 B CPU 64 tg128 140.72 ± 0.12

The bit packing that they have chosen is not optimal, but I did not want to do repacking and all that, so this is what it is for now. Still a very decent performance.

Here is the result of a perplexity run with wiki.test.raw

Final estimate: PPL over 584 chunks for n_ctx=512 = 16.2711 +/- 0.13553

@Ph0rk0z
Copy link
Copy Markdown

Ph0rk0z commented Apr 2, 2026

So they were not trained from scratch? Just a bunch of tokens ran through old qwen 8b in a post-quantization scheme?

@ikawrakow
Copy link
Copy Markdown
Owner Author

So they were not trained from scratch? Just a bunch of tokens ran through old qwen 8b in a post-quantization scheme?

No, the models were trained in 1 bit. Just like the BitNet models from Microsoft (except that MS BitNet uses ternary weights, so 1.58 bpw, while those are really just 1-bit).

You don't get that kind of PPL with 1-bit post-training quantization.

My guess is that this company is aiming to get funding so they can train a large model (100B+)

@Ph0rk0z
Copy link
Copy Markdown

Ph0rk0z commented Apr 2, 2026

That's what I initially thought too but then I saw people saying it was Qwen3-8b in the paper. The benchmaxx scores have it like a 2b model. I could understand if they were doing a fresh train with just that architecture. If it is really a larger "train" to get the model down to 1b weights it's misrepresented.

@ikawrakow
Copy link
Copy Markdown
Owner Author

Well, then I don't know. I didn't find details how the models were trained (but I also didn't put real effort into searching). But for sure you cannot get to such PPL with 1.125 bpw post-training quantization. The architecture of the models is identical to the corresponding Qwen3 dense models.

@ikawrakow ikawrakow merged commit 90ec1b8 into main Apr 2, 2026
@SmartestWashingMachine
Copy link
Copy Markdown

Yeah, it seems there was some prior training / tuning. See an old blog post of theirs where they mentioned:

"The 1-bit Bonsai 8B model is an 8-billion parameter Large Language Model where each parameter has 1-bit precision. It has been trained using Google v4 TPUs."

Further evidence is that reasoning doesn't seem to work right with it yet.

They don't go into much detail on how it was trained or compressed during conversion; just that it's some proprietary tech.

@Ph0rk0z
Copy link
Copy Markdown

Ph0rk0z commented Apr 3, 2026

The options are model from scratch, trained in 1bit.. like bitnet. And model quantized with high amount of tokens to make it 1-bit. The latter has certainly been attempted before and leads to the mentioned performance... being like a 1-4b or whatever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants