Gemma 4 template parser fixes#21326
Conversation
aldehir
left a comment
There was a problem hiding this comment.
We should roll a dedicated parser for this model.
common/peg-parser.cpp
Outdated
| common_peg_parse_result operator()(const common_peg_string_delim_parser & p) const { | ||
| trie matcher({p.delimiter}); | ||
|
|
||
| size_t pos = start_pos; | ||
| size_t last_valid_pos = start_pos; | ||
|
|
||
| while (pos < ctx.input.size()) { | ||
| auto utf8_result = common_parse_utf8_codepoint(ctx.input, pos); | ||
|
|
||
| if (utf8_result.status == utf8_parse_result::INCOMPLETE) { | ||
| if (!ctx.is_lenient()) { | ||
| return common_peg_parse_result(COMMON_PEG_PARSE_RESULT_FAIL, start_pos); | ||
| } | ||
| return common_peg_parse_result(COMMON_PEG_PARSE_RESULT_NEED_MORE_INPUT, start_pos, last_valid_pos); | ||
| } | ||
|
|
||
| if (utf8_result.status == utf8_parse_result::INVALID) { | ||
| return common_peg_parse_result(COMMON_PEG_PARSE_RESULT_FAIL, start_pos); | ||
| } | ||
|
|
||
| auto match = matcher.check_at(ctx.input, pos); | ||
|
|
||
| if (match == trie::COMPLETE_MATCH) { | ||
| return common_peg_parse_result(COMMON_PEG_PARSE_RESULT_SUCCESS, start_pos, pos); | ||
| } | ||
|
|
||
| if (match == trie::PARTIAL_MATCH) { | ||
| return common_peg_parse_result(COMMON_PEG_PARSE_RESULT_SUCCESS, start_pos, pos); | ||
| } | ||
|
|
||
| pos += utf8_result.bytes_consumed; | ||
| last_valid_pos = pos; | ||
| } | ||
|
|
||
| if (last_valid_pos == ctx.input.size() && ctx.is_lenient()) { | ||
| return common_peg_parse_result(COMMON_PEG_PARSE_RESULT_NEED_MORE_INPUT, start_pos, last_valid_pos); | ||
| } | ||
| return common_peg_parse_result(COMMON_PEG_PARSE_RESULT_NEED_MORE_INPUT, start_pos, last_valid_pos); | ||
| } |
There was a problem hiding this comment.
This is functionally equivalent to p.until("<|\"|>") + p.literal("<|\"|>"). There's no need for a new parser.
common/chat.h
Outdated
| std::string thinking_start_tag; // e.g., "💭" | ||
| std::string thinking_end_tag; // e.g., "_flow" |
There was a problem hiding this comment.
Beats me, model went cuckoo :P
Yes, but I want something out quickly when people are testing. We'll definitely do a proper one and cleanup later on. |
|
Ok, I'm good with that. |
| value_parser = p.literal(QUOTE) + | ||
| p.tool_arg_string_value(p.until(QUOTE)) + | ||
| p.literal(QUOTE); | ||
| } else if (type == "number" || type == "integer") { | ||
| value_parser = p.tool_arg_value(g4.gemma4_number()); | ||
| } else if (type == "boolean") { | ||
| value_parser = p.tool_arg_value(g4.gemma4_bool()); | ||
| } else if (type == "null") { | ||
| value_parser = p.tool_arg_value(g4.gemma4_null()); | ||
| } else if (type == "object") { | ||
| value_parser = p.tool_arg_value(g4.gemma4_dict()); | ||
| } else if (type == "array") { | ||
| value_parser = p.tool_arg_value(g4.gemma4_array()); | ||
| } else { | ||
| // Numbers, booleans: raw text up to the next comma or closing brace | ||
| value_parser = p.tool_arg_value(p.until_one_of({",", "}"})); | ||
| value_parser = p.tool_arg_value(g4.gemma4_value()); |
There was a problem hiding this comment.
Should use gemma4_value_for_type() here?
|
|
||
| static std::string normalize_gemma4_to_json(const std::string & input) { | ||
| std::string result; | ||
| result.reserve(input.size() * 2); |
There was a problem hiding this comment.
In the previous chat-peg-parser, I had a mapper that would build this JSON up incrementally via the AST instead of through a separate pass. Was that removed?
There was a problem hiding this comment.
No, it's still there (void common_chat_peg_mapper::map(const common_peg_ast_node & node)). Would have to adapt to the funny format, the model I used for the refactoring was too dumb to do it apparently.
|
Done. |
|
The new gemma4.jinja template still has the same issue as the GGUF-embedded template: value['type'] | upper crashes with Unknown (built-in) filter 'upper' for type Array when a tool parameter uses JSON Schema The format_parameters macro already handles this correctly for array items (lines ~487–489), but not at the property level. The fix is one line before the if/elif chain:
...
...
...
Reproduced with gemma-4-27b-it (GGUF) served via llama-server --jinja when the client sends tool schemas with array types |
Cherry-picked from ggml-org/llama.cpp: - fix: gemma 4 template (ggml-org#21326) - vocab: fix Gemma4 tokenizer (ggml-org#21343) - llama-model: read final_logit_softcapping for Gemma 4 (ggml-org#21390) - llama: add custom newline split for Gemma 4 (ggml-org#21406) - common: add gemma 4 specialized parser (ggml-org#21418) Resolved conflict in chat.h/chat.cpp: kept our extended common_chat_template_direct_apply signature as internal _full variant.
Partially implements Gemma4 chat template fix from llama.cpp master. What was done: - Add COMMON_CHAT_FORMAT_PEG_GEMMA4 enum value to common_chat_format What was NOT implemented (infrastructure missing): - chat-auto-parser module does not exist in this fork - common_peg_gemma4_builder class for tool calling - normalize_gemma4_to_json() function - gemma4.jinja template file - tests for gemma4 tool calling The full fix requires the chat-auto-parser infrastructure which is not present in this fork. This enum addition is a placeholder for future implementation when the chat template system is upgraded. See original PR: ggml-org/llama.cpp#21326 Co-authored-by: Piotr Wilkin (ilintar) <[email protected]>
Partially implements Gemma4 chat template fix from llama.cpp master. What was done: - Add COMMON_CHAT_FORMAT_PEG_GEMMA4 enum value to common_chat_format What was NOT implemented (infrastructure missing): - chat-auto-parser module does not exist in this fork - common_peg_gemma4_builder class for tool calling - normalize_gemma4_to_json() function - gemma4.jinja template file - tests for gemma4 tool calling The full fix requires the chat-auto-parser infrastructure which is not present in this fork. This enum addition is a placeholder for future implementation when the chat template system is upgraded. See original PR: ggml-org/llama.cpp#21326 Co-authored-by: Piotr Wilkin (ilintar) <[email protected]>
Rebased onto upstream master (b8672+) which includes Gemma 4 model support (PR ggml-org#21309, ggml-org#21326, ggml-org#21418). This enables loading Gemma 4 E2B/E4B GGUF models on-device via llama.cpp. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Overview
As in topic
Additional information
Quick fixes for some observed discrepancies + refactoring of the parser architecture for the dict format
Requirements