UPSTREAM PR #17251: Kimi-K2-Thinking native tool calling format#202
UPSTREAM PR #17251: Kimi-K2-Thinking native tool calling format#202
Conversation
|
Access the complete analysis in the LOCI Dashboard Performance Analysis SummaryOverviewAnalysis of version Key FindingsHighest Performance Impact:
Power Consumption Analysis:
Root Cause Analysis:
Technical Details: Actionable Recommendations
The Kimi K2 functionality itself is well-implemented and isolated from core inference paths, with the performance issue stemming from build system changes rather than the new feature implementation. |
4ab2d66 to
4fb52c0
Compare
4600128 to
865bc12
Compare
Mirrored from ggml-org/llama.cpp#17251
The implementation might support Kimi-K2-Instruct too, but I don't have enough disk space to test now :(
Almost silly copy-paste from DeepSeek V3.1 ggml-org/llama.cpp#15533, modified according to https://github.com/MoonshotAI/Kimi-K2/blob/main/docs/tool_call_guidance.md: matching function id instead of plain function name.
Considerations:
<think>tag at the end, sothinking_forced_openis false. Should we test it by modify the template manually?tojson(separators=(',', ':')). Although the value ofseparatorsis the same as default value, but we must remove it to make the template work for minja.<|tool▁calls▁begin|>tool...and ignoring<|tool▁call▁begin|>, but I have not observed such behavior in Kimi-K2-Thinking and always get<|tool_calls_section_begin|><|tool_call_begin|>, therefore I'm removing the?in the function regex.https://github.com/ggml-org/llama.cpp/blob/c4abcb2457217198efdd67d02675f5fddb7071c2/common/chat.cpp#L1751
Actually, I always get an extra
<|tool_calls_section_end|>when keeping?, but I have not been able to fix it, so finally removed the?.For maintainers: I may have a busy weekend so fell free to edit directly if I'm not able to reply in time.
Closes #17155.