Migrate Qwen3.5 to IMROPE by pwilkin · Pull Request #19443 · ggml-org/llama.cpp

pwilkin · 2026-02-09T00:01:52Z

As @ngxson rightly noticed, Qwen3.5 actually inherits from Qwen3VL, not from Qwen3Next in terms of RoPE, so we need IMROPE and not NEOX.

pwilkin · 2026-02-09T00:02:09Z

Throwing this out there and going to sleep, will take a look tomorrow.

ngxson · 2026-02-09T00:12:34Z

Technically IMROPE and MROPE is NEOX for text-only, so at least there is no problem with text input.

Btw, would appreciate if you can try permuting IMROPE --> MROPE in the conversion script though. Otherwise IMROPE may decrease the perf in some cases. Ref the layout of IMROPE vs MROPE:

llama.cpp/ggml/src/ggml-cpu/ops.cpp

Lines 5600 to 5609 in 39bf692

    
           if (is_imrope) { // qwen3vl apply interleaved mrope 
        
               if (sector % 3 == 1 && sector < 3 * sections[1]) { 
        
                   theta = theta_h; 
        
               } else if (sector % 3 == 2 && sector < 3 * sections[2]) { 
        
                   theta = theta_w; 
        
               } else if (sector % 3 == 0 && sector < 3 * sections[0]) { 
        
                   theta = theta_t; 
        
               } else { 
        
                   theta = theta_e; 
        
               }

From what I understand, given example [t, t, t, x, x, y, y] input position (with t=time dimension), then IMROPE layout will be: [t, x, y, t, x, y, t]

pwilkin · 2026-02-09T00:16:56Z

@ngxson will try tomorrow (unless they kill me at work).

pwilkin · 2026-02-09T11:02:13Z

@ngxson I admit I'm feeling a bit out of my league here, but Opus claims that it can't be done:

IMROPE vs MROPE: Analysis and Equivalence

Background

Qwen3.5 uses IMROPE (Interleaved Multi-dimensional Rotary Position Embedding),
inherited from Qwen3VL. The IMROPE kernel is slower than MROPE due to modulo
operations in the index mapping. This document analyzes whether MROPE can be used
instead, and whether weight permutation can achieve exact equivalence.

How RoPE Works (Quick Recap)

RoPE rotates pairs of elements (Q[2j], Q[2j+1]) by an angle:

theta_j = position * freq(j)

where freq(j) = rope_theta * theta_scale^j. The frequency is determined by the
pair index j.

MROPE and IMROPE

Both MROPE and IMROPE extend RoPE to multiple position dimensions (T=time,
H=height, W=width, E=extra) using a sections array (e.g., [11, 11, 10, 0]).
Each pair index j is assigned to one position dimension. The angle becomes:

theta_j = pos[dim(j)] * freq(j)

where dim(j) depends on the rope type.

MROPE: Chunked Layout

Assigns contiguous chunks of pair indices to each dimension:

sections = [11, 11, 10]
Pair indices:  [0-10] → T,  [11-21] → H,  [22-31] → W
Layout:        TTTTTTTTTTT HHHHHHHHHHH WWWWWWWWWW

IMROPE: Interleaved Layout

Assigns pair indices in round-robin fashion:

sections = [11, 11, 10]
Pair indices:  0→T, 1→H, 2→W, 3→T, 4→H, 5→W, 6→T, ...
Layout:        THWTHWTHWTHWTHWTHWTHWTHWTHWTHW TH

(W runs out after 10×3=30 slots, remaining 2 slots get T, H)

Implementation (ggml)

From ggml/src/ggml-cpu/ops.cpp, the ggml_mrope_cache_init function:

int sector = (i0 / 2) % sect_dims;

if (is_imrope) {
    if      (sector % 3 == 1 && sector < 3 * sections[1]) theta = theta_h;
    else if (sector % 3 == 2 && sector < 3 * sections[2]) theta = theta_w;
    else if (sector % 3 == 0 && sector < 3 * sections[0]) theta = theta_t;
    else                                                   theta = theta_e;
} else {
    if      (sector < sections[0])                         theta = theta_t;
    else if (sector < sections[0] + sections[1])           theta = theta_h;
    else if (sector < sections[0] + sections[1] + sections[2]) theta = theta_w;
    else                                                   theta = theta_e;
}

Crucially, all four thetas advance together at every iteration:

theta_t *= theta_scale;
theta_h *= theta_scale;
theta_w *= theta_scale;
theta_e *= theta_scale;

Since they all start from the same rope_theta base, at pair index j:

theta_t = theta_h = theta_w = theta_e = rope_theta * theta_scale^j

The frequency is identical across dimensions at any given pair index. The only
difference between IMROPE and MROPE is which position value (pos_T, pos_H,
pos_W) multiplies this shared frequency.

Text-Only Equivalence

For text-only input, all position dimensions have the same value:

pos_T = pos_H = pos_W = pos_text

Therefore:

theta_j = pos_text * rope_theta * theta_scale^j

This is identical regardless of which dimension is assigned to pair j. IMROPE
and MROPE produce bit-identical results for text-only input. No weight changes
are needed — just switch the rope type.

This was verified empirically: switching Qwen3.5 from IMROPE to MROPE produces
identical NMSE values (Dense: 8.94e-06, MoE: 9.36e-05).

Why Weight Permutation Cannot Achieve General Equivalence

The Approach

Bake a permutation P into Q/K weights so that element pair j moves to position
P(j), choosing P such that mrope_dim(P(j)) = imrope_dim(j) (the MROPE
dimension assignment at the new position matches IMROPE's at the old position).

The Problem

After permutation, the element originally at pair j is at position P(j) and
gets rotated by:

theta = pos[mrope_dim(P(j))] * freq(P(j))
      = pos[imrope_dim(j)]   * freq(P(j))    ← correct dimension

But the desired rotation was:

theta = pos[imrope_dim(j)] * freq(j)          ← correct frequency

Since P(j) ≠ j, we have freq(P(j)) ≠ freq(j). Example with sections=[11,11,10]:

Original pair (IMROPE)	dim	Permuted to (MROPE)	dim	freq match?
j=0 → T	T	P(0)=0	T	freq(0)=freq(0) ✓
j=1 → H	H	P(1)=11	H	freq(11)≠freq(1) ✗
j=2 → W	W	P(2)=22	W	freq(22)≠freq(2) ✗
j=3 → T	T	P(3)=1	T	freq(1)≠freq(3) ✗

Can We Compensate in the Weights?

Pre-rotation: Bake a fixed rotation angle alpha_j into each pair:

total_angle = alpha_j + pos * freq(P(j))
desired     =           pos * freq(j)
→  alpha_j  = pos * (freq(j) - freq(P(j)))

alpha_j must be a constant (baked into weights), but pos varies per token.
No fixed alpha_j works.

General linear transform: Apply a fixed 2×2 matrix M_j per pair:

M_j @ Rotate(pos * freq(P(j))) = Rotate(pos * freq(j))
→  M_j = Rotate(pos * (freq(j) - freq(P(j))))    ← depends on pos

Again position-dependent — cannot be baked into weights.

Fundamental Reason

The correction freq(j) - freq(P(j)) is a property of the pair indices, but
it must be multiplied by pos (which varies per token) to get the angle
correction. No fixed weight transformation can compensate for a
position-dependent angle difference.

Comparison with Normal ↔ NeoX RoPE Conversion

Normal ↔ NeoX RoPE conversion via weight permutation does work. The key
difference is what changes between the two formats.

Normal RoPE vs NeoX RoPE

Both use a single position dimension. The difference is which elements form a pair:

Normal (GPT-J): pairs first half with second half: (q_j, q_{j+n/2})
NeoX: pairs consecutive elements: (q_{2j}, q_{2j+1})

Example with n_rot=8, elements q0..q7:

Normal RoPE:                    NeoX RoPE:
Pair 0: (q0, q4)  freq(0)      Pair 0: (q0, q1)  freq(0)
Pair 1: (q1, q5)  freq(1)      Pair 1: (q2, q3)  freq(1)
Pair 2: (q2, q6)  freq(2)      Pair 2: (q4, q5)  freq(2)
Pair 3: (q3, q7)  freq(3)      Pair 3: (q6, q7)  freq(3)

Why It Works

The permutation [q0,q1,q2,q3,q4,q5,q6,q7] → [q0,q4,q1,q5,q2,q6,q3,q7]
rearranges elements so that NeoX sees:

Pair 0: (q0, q4)  freq(0)  ← same elements AND same freq as Normal pair 0
Pair 1: (q1, q5)  freq(1)  ← same elements AND same freq as Normal pair 1
Pair 2: (q2, q6)  freq(2)  ← same
Pair 3: (q3, q7)  freq(3)  ← same

Each pair keeps its pair index (and therefore its frequency). Only the
element arrangement within the head dimension changes.

Why IMROPE ↔ MROPE Is Different

Both IMROPE and MROPE already use the same pairing (NeoX-style consecutive
elements). The difference is which position dimension each pair index maps
to. To fix the dimension assignment, you must move entire pairs to different
pair indices — which changes the frequency. There is no within-pair rearrangement
that can fix a between-pair dimension assignment.

Conversion	What differs	Permutation moves	Frequency preserved?
Normal ↔ NeoX	Which elements form a pair	Elements within pairs	✓ Yes (pair index unchanged)
IMROPE ↔ MROPE	Which dimension per pair	Entire pairs to new indices	✗ No (pair index changes)

Recommendation for llama.cpp

For text-only Qwen3.5 models: use LLAMA_ROPE_TYPE_MROPE instead of
LLAMA_ROPE_TYPE_IMROPE. The results are identical and the kernel is faster.

If multimodal Qwen3.5 support is ever added (where position dimensions
differ), the IMROPE kernel would be required, or a runtime
permute→MROPE→unpermute approach could be explored (trading permutation cost
vs modulo cost in the kernel).

JJJYmmm · 2026-02-09T12:25:40Z

agree, the rotation of channel i is r_i = pos[i] * freq[i]. If we want to change from thwthw... to ttt..hhh..www.., only permuting the proj weights just change the positions of each channel. But the freq is mismatched. For example, move the second t at pos 3 to pos 1. The mrope kenel assign it with freq[1], but the right one is freq[3]. cc @ngxson

JJJYmmm · 2026-02-09T12:35:12Z

if modulo op is slower, how about the counter? 🧐

int mod3 = 0;
for (int64_t i0 = 0; i0 < ne0; i0 += 2) {
    int sector = (i0 / 2) % sect_dims;

    if (sector == 0) mod3 = 0; 

    if (is_imrope) {
        if (mod3 == 1 && sector < 3 * sections[1]) {
            theta = theta_h;
        } else if (mod3 == 2 && sector < 3 * sections[2]) {
            theta = theta_w;
        } else if (mod3 == 0 && sector < 3 * sections[0]) {
            theta = theta_t;
        } else {
            theta = theta_e;
        }
    }

    // ...

    if (++mod3 == 3) mod3 = 0;
}

ngxson · 2026-02-09T15:27:15Z

@JJJYmmm hmm yeah right, I'll experiment to see what's exactly was the problem. for now, at least test-backend-ops reports that the imrope is significantly slower than mrope. I'll move this to an issue for further discussions

ngxson · 2026-02-09T15:35:44Z

Alright, sorry I realized that the test-backend-ops case is incorrect. The perf should be the same between imrope <> mrope, so no permutations is needed @JJJYmmm

Ref: #19464

pwilkin · 2026-02-09T19:16:48Z

Obsoleted by #19468

Migrate Qwen3.5 to IMROPE

88133d2

pwilkin requested a review from ngxson February 9, 2026 00:01

pwilkin requested a review from CISC as a code owner February 9, 2026 00:01

github-actions bot added the model Model specific label Feb 9, 2026

pwilkin closed this Feb 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate Qwen3.5 to IMROPE#19443

Migrate Qwen3.5 to IMROPE#19443
pwilkin wants to merge 1 commit intoggml-org:masterfrom
pwilkin:qwen35-imrope

pwilkin commented Feb 9, 2026

Uh oh!

pwilkin commented Feb 9, 2026

Uh oh!

ngxson commented Feb 9, 2026 •

edited

Loading

Uh oh!

pwilkin commented Feb 9, 2026

Uh oh!

pwilkin commented Feb 9, 2026

Uh oh!

JJJYmmm commented Feb 9, 2026

Uh oh!

JJJYmmm commented Feb 9, 2026

Uh oh!

ngxson commented Feb 9, 2026

Uh oh!

ngxson commented Feb 9, 2026

Uh oh!

pwilkin commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pwilkin commented Feb 9, 2026

Uh oh!

pwilkin commented Feb 9, 2026

Uh oh!

ngxson commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Feb 9, 2026

Uh oh!

pwilkin commented Feb 9, 2026

IMROPE vs MROPE: Analysis and Equivalence

Background

How RoPE Works (Quick Recap)

MROPE and IMROPE

MROPE: Chunked Layout

IMROPE: Interleaved Layout

Implementation (ggml)

Text-Only Equivalence

Why Weight Permutation Cannot Achieve General Equivalence

The Approach

The Problem

Can We Compensate in the Weights?

Fundamental Reason

Comparison with Normal ↔ NeoX RoPE Conversion

Normal RoPE vs NeoX RoPE

Why It Works

Why IMROPE ↔ MROPE Is Different

Recommendation for llama.cpp

Uh oh!

JJJYmmm commented Feb 9, 2026

Uh oh!

JJJYmmm commented Feb 9, 2026

Uh oh!

ngxson commented Feb 9, 2026

Uh oh!

ngxson commented Feb 9, 2026

Uh oh!

pwilkin commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ngxson commented Feb 9, 2026 •

edited

Loading