-
Notifications
You must be signed in to change notification settings - Fork 16.5k
Bug: Chat Template Filter Error - 'upper' filter cannot be applied to Array type #21347
Copy link
Copy link
Closed
Description
Environment
- llama.cpp version: 8642 (commit
7c7d6ce5c) - Compiler: GNU 13.3.0 for Linux x86_64
- Model: Gemma-4-26B-A4B-It (GGUF, MXFP4 MoE quantization)
- GPU: NVIDIA GeForce RTX 4090 D (CUDA 8.9, 48GB VRAM)
- OS: Linux
Steps to Reproduce
- Start llama-server with Gemma-4-26B-A4B-It model:
llama-server -m /path/to/gemma-4-26B-A4B-it-MXFP4_MOE.gguf \
--host 0.0.0.0 --port 8080 -c 262144 --keep 1024 \
--mmproj /path/to/mmproj-BF16.gguf- Server starts successfully:
main: model loaded
main: server is listening on http://0.0.0.0:8080
main: starting the main loop...
- Send any chat completion request:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gemma-4-26B-A4B-it", "messages": [{"role": "user", "content": "Hello"}], "max_tokens": 100}'- Server returns HTTP 500 error repeatedly
Expected Behavior
Server should process the chat completion request and return a response.
Actual Behavior
Server returns HTTP 500 with the following error:
{
"error": {
"code": 500,
"message": "\n------------\nWhile executing FilterExpression at line 18, column 34 in source:\n...if -%}↵ {%- if value['type'] | upper == 'STRING' -%}↵ ...\n ^\nError: Unknown (built-in) filter 'upper' for type Array",
"type": "server_error"
}
}Error Analysis
Error Location: Chat template, line 18, column 34
Problematic Code:
{%- if value['type'] | upper == 'STRING' -%}Root Cause: The template attempts to apply the upper filter to value['type'], but value['type'] is an Array instead of a String. The upper filter is not defined for Array types in the template engine.
Model Metadata (from GGUF)
general.architecture = gemma4
general.name = Gemma-4-26B-A4B-It
general.quantized_by = Unsloth
tokenizer.ggml.model = gemma4
tokenizer.ggml.chat_template = {%- macro format_parameters(propertie...
Server Startup Logs (Key Sections)
main: model loaded
main: server is listening on http://0.0.0.0:8080
main: starting the main loop...
srv update_slots: all slots are idle
srv operator(): got exception: {"error":{"code":500,"message":"...
srv log_server_r: done request: POST /v1/chat/completions 192.168.193.91 500
Possible Causes
- Model's chat template bug: The
tokenizer.ggml.chat_templateembedded in the GGUF file may have an error - Template engine limitation: llama.cpp's chat template parser may not handle certain edge cases correctly
- Type mismatch: The template expects
value['type']to be a string but receives an array
Workarounds
- Use
--no-chat-templateflag to disable automatic chat templating:llama-server -m ... --no-chat-template ...
- Manually format prompts according to the model's expected format
- Check if a corrected GGUF version is available from the model provider (Unsloth)
Additional Notes
- Model loads successfully without errors
- Server starts and listens on port 8080
- Error occurs immediately on the first chat completion request
- Same error repeats on all subsequent requests
- The error is consistent and reproducible
Question: Is this a bug in the model's chat template, or should llama.cpp handle this case more gracefully? Should the server fail with a clearer error message during startup instead of failing on the first request?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels