Bonsai support (AVX2, generic) by ikawrakow · Pull Request #1570 · ikawrakow/ik_llama.cpp

ikawrakow · 2026-04-02T11:50:46Z

I spotted the Bonsai models (actually, there was a post on HN and that's how I "spotted" them). These are true 1-bit models (1 bit per weight plus a 16-bit scale per 128 weights, so effectively 1.125 bpw).

Don't know to what extend they can be useful, but I'm sure at least some are curious to try them.

So, here we go. This PR adds CPU-only support (AVX2 and generic implementation).

In case you are curious about performance, here is what I get on a Ryzen-3995WX for the 4B model:

model	size	params	backend	threads	test	t/s
qwen3 4B Q1_0_G128 - 1.125 bpw	592.16 MiB	4.41 B	CPU	64	pp512	452.99 ± 2.79
qwen3 4B Q1_0_G128 - 1.125 bpw	592.16 MiB	4.41 B	CPU	64	tg128	140.72 ± 0.12

The bit packing that they have chosen is not optimal, but I did not want to do repacking and all that, so this is what it is for now. Still a very decent performance.

Here is the result of a perplexity run with wiki.test.raw

Final estimate: PPL over 584 chunks for n_ctx=512 = 16.2711 +/- 0.13553

Ph0rk0z · 2026-04-02T13:21:22Z

So they were not trained from scratch? Just a bunch of tokens ran through old qwen 8b in a post-quantization scheme?

ikawrakow · 2026-04-02T13:27:28Z

So they were not trained from scratch? Just a bunch of tokens ran through old qwen 8b in a post-quantization scheme?

No, the models were trained in 1 bit. Just like the BitNet models from Microsoft (except that MS BitNet uses ternary weights, so 1.58 bpw, while those are really just 1-bit).

You don't get that kind of PPL with 1-bit post-training quantization.

My guess is that this company is aiming to get funding so they can train a large model (100B+)

Ph0rk0z · 2026-04-02T13:39:00Z

That's what I initially thought too but then I saw people saying it was Qwen3-8b in the paper. The benchmaxx scores have it like a 2b model. I could understand if they were doing a fresh train with just that architecture. If it is really a larger "train" to get the model down to 1b weights it's misrepresented.

ikawrakow · 2026-04-02T13:52:11Z

Well, then I don't know. I didn't find details how the models were trained (but I also didn't put real effort into searching). But for sure you cannot get to such PPL with 1.125 bpw post-training quantization. The architecture of the models is identical to the corresponding Qwen3 dense models.

SmartestWashingMachine · 2026-04-03T01:13:38Z

Yeah, it seems there was some prior training / tuning. See an old blog post of theirs where they mentioned:

"The 1-bit Bonsai 8B model is an 8-billion parameter Large Language Model where each parameter has 1-bit precision. It has been trained using Google v4 TPUs."

Further evidence is that reasoning doesn't seem to work right with it yet.

They don't go into much detail on how it was trained or compressed during conversion; just that it's some proprietary tech.

Ph0rk0z · 2026-04-03T12:57:28Z

The options are model from scratch, trained in 1bit.. like bitnet. And model quantized with high amount of tokens to make it 1-bit. The latter has certainly been attempted before and leads to the mentioned performance... being like a 1-4b or whatever.

ikawrakow and others added 2 commits April 2, 2026 14:38

Bonsai support (AVX2, generic)

1f0c6fa

Fix ARM build

6a55c12

ikawrakow merged commit 90ec1b8 into main Apr 2, 2026

ikawrakow mentioned this pull request Apr 2, 2026

Bonsai support (ARM_NEON) #1571

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bonsai support (AVX2, generic)#1570

Bonsai support (AVX2, generic)#1570
ikawrakow merged 2 commits intomainfrom
ik/bonsai_avx2

ikawrakow commented Apr 2, 2026 •

edited

Loading

Uh oh!

Ph0rk0z commented Apr 2, 2026

Uh oh!

ikawrakow commented Apr 2, 2026

Uh oh!

Ph0rk0z commented Apr 2, 2026

Uh oh!

ikawrakow commented Apr 2, 2026

Uh oh!

SmartestWashingMachine commented Apr 3, 2026

Uh oh!

Ph0rk0z commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ikawrakow commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ph0rk0z commented Apr 2, 2026

Uh oh!

ikawrakow commented Apr 2, 2026

Uh oh!

Ph0rk0z commented Apr 2, 2026

Uh oh!

ikawrakow commented Apr 2, 2026

Uh oh!

SmartestWashingMachine commented Apr 3, 2026

Uh oh!

Ph0rk0z commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ikawrakow commented Apr 2, 2026 •

edited

Loading