Conversation
|
Nvm, complete brainfart on my end. |
I only checked |
This comment was marked as resolved.
This comment was marked as resolved.
|
@CISC if you want to make a branch I can test kimi-k2, deepseek-v3-0324, and qwen3-coder. |
|
The only other line I can see is here: but this is for model loading, so shouldn't make any difference. AFAIK, all the other MoE models are using the value of struct llm_graph_context {
const llm_arch arch;
const llama_hparams & hparams;
const llama_cparams & cparams;
const llama_ubatch & ubatch;
const int64_t n_embd;
const int64_t n_layer;
const int64_t n_rot;
const int64_t n_ctx; // user-specified context size (can be different from n_ctx_train)
const int64_t n_head;
const int64_t n_head_kv;
const int64_t n_embd_head_k;
const int64_t n_embd_k_gqa;
const int64_t n_embd_head_v;
const int64_t n_embd_v_gqa;
const int64_t n_expert;
const int64_t n_expert_used;
...which is set in the constructor based on the state of |
|
It could be the confusion about the term "shadowing": https://en.wikipedia.org/wiki/Variable_shadowing
The key thing is the subclasses of n_expert_used (cparams.warmup ? hparams.n_expert : hparams.n_expert_used)I may well be wrong though as you know far more about the codebase than me! I only traced it back as far as here and didn't go as far as seeing when this constructor is called, etc. |
|
Oh, LOL, sorry, for some reason I was looking at the diff the wrong way. |
This fixes the warmup bug reported by @createthis here:
#14939 (comment)
It was caused by these local variables shadowing those assigned here during warmup: