Conversation
|
Throwing this out there and going to sleep, will take a look tomorrow. |
|
Technically IMROPE and MROPE is NEOX for text-only, so at least there is no problem with text input. Btw, would appreciate if you can try permuting IMROPE --> MROPE in the conversion script though. Otherwise IMROPE may decrease the perf in some cases. Ref the layout of IMROPE vs MROPE: llama.cpp/ggml/src/ggml-cpu/ops.cpp Lines 5600 to 5609 in 39bf692 From what I understand, given example [t, t, t, x, x, y, y] input position (with t=time dimension), then IMROPE layout will be: [t, x, y, t, x, y, t] |
|
@ngxson will try tomorrow (unless they kill me at work). |
|
@ngxson I admit I'm feeling a bit out of my league here, but Opus claims that it can't be done: IMROPE vs MROPE: Analysis and EquivalenceBackgroundQwen3.5 uses IMROPE (Interleaved Multi-dimensional Rotary Position Embedding), How RoPE Works (Quick Recap)RoPE rotates pairs of elements where MROPE and IMROPEBoth MROPE and IMROPE extend RoPE to multiple position dimensions (T=time, where MROPE: Chunked LayoutAssigns contiguous chunks of pair indices to each dimension: IMROPE: Interleaved LayoutAssigns pair indices in round-robin fashion: (W runs out after 10×3=30 slots, remaining 2 slots get T, H) Implementation (ggml)From int sector = (i0 / 2) % sect_dims;
if (is_imrope) {
if (sector % 3 == 1 && sector < 3 * sections[1]) theta = theta_h;
else if (sector % 3 == 2 && sector < 3 * sections[2]) theta = theta_w;
else if (sector % 3 == 0 && sector < 3 * sections[0]) theta = theta_t;
else theta = theta_e;
} else {
if (sector < sections[0]) theta = theta_t;
else if (sector < sections[0] + sections[1]) theta = theta_h;
else if (sector < sections[0] + sections[1] + sections[2]) theta = theta_w;
else theta = theta_e;
}Crucially, all four thetas advance together at every iteration: theta_t *= theta_scale;
theta_h *= theta_scale;
theta_w *= theta_scale;
theta_e *= theta_scale;Since they all start from the same The frequency is identical across dimensions at any given pair index. The only Text-Only EquivalenceFor text-only input, all position dimensions have the same value: Therefore: This is identical regardless of which dimension is assigned to pair This was verified empirically: switching Qwen3.5 from IMROPE to MROPE produces Why Weight Permutation Cannot Achieve General EquivalenceThe ApproachBake a permutation P into Q/K weights so that element pair The ProblemAfter permutation, the element originally at pair But the desired rotation was: Since
Can We Compensate in the Weights?Pre-rotation: Bake a fixed rotation angle
General linear transform: Apply a fixed 2×2 matrix Again position-dependent — cannot be baked into weights. Fundamental ReasonThe correction Comparison with Normal ↔ NeoX RoPE ConversionNormal ↔ NeoX RoPE conversion via weight permutation does work. The key Normal RoPE vs NeoX RoPEBoth use a single position dimension. The difference is which elements form a pair:
Example with Why It WorksThe permutation Each pair keeps its pair index (and therefore its frequency). Only the Why IMROPE ↔ MROPE Is DifferentBoth IMROPE and MROPE already use the same pairing (NeoX-style consecutive
Recommendation for llama.cppFor text-only Qwen3.5 models: use If multimodal Qwen3.5 support is ever added (where position dimensions |
|
agree, the rotation of channel |
|
if modulo op is slower, how about the counter? 🧐 int mod3 = 0;
for (int64_t i0 = 0; i0 < ne0; i0 += 2) {
int sector = (i0 / 2) % sect_dims;
if (sector == 0) mod3 = 0;
if (is_imrope) {
if (mod3 == 1 && sector < 3 * sections[1]) {
theta = theta_h;
} else if (mod3 == 2 && sector < 3 * sections[2]) {
theta = theta_w;
} else if (mod3 == 0 && sector < 3 * sections[0]) {
theta = theta_t;
} else {
theta = theta_e;
}
}
// ...
if (++mod3 == 3) mod3 = 0;
} |
|
@JJJYmmm hmm yeah right, I'll experiment to see what's exactly was the problem. for now, at least |
|
Obsoleted by #19468 |
As @ngxson rightly noticed, Qwen3.5 actually inherits from Qwen3VL, not from Qwen3Next in terms of RoPE, so we need IMROPE and not NEOX.