Skip to content

Bug: Chat Template Filter Error - 'upper' filter cannot be applied to Array type #21347

@saga197410qq

Description

@saga197410qq

Environment

  • llama.cpp version: 8642 (commit 7c7d6ce5c)
  • Compiler: GNU 13.3.0 for Linux x86_64
  • Model: Gemma-4-26B-A4B-It (GGUF, MXFP4 MoE quantization)
  • GPU: NVIDIA GeForce RTX 4090 D (CUDA 8.9, 48GB VRAM)
  • OS: Linux

Steps to Reproduce

  1. Start llama-server with Gemma-4-26B-A4B-It model:
llama-server -m /path/to/gemma-4-26B-A4B-it-MXFP4_MOE.gguf \
  --host 0.0.0.0 --port 8080 -c 262144 --keep 1024 \
  --mmproj /path/to/mmproj-BF16.gguf
  1. Server starts successfully:
main: model loaded
main: server is listening on http://0.0.0.0:8080
main: starting the main loop...
  1. Send any chat completion request:
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemma-4-26B-A4B-it", "messages": [{"role": "user", "content": "Hello"}], "max_tokens": 100}'
  1. Server returns HTTP 500 error repeatedly

Expected Behavior

Server should process the chat completion request and return a response.

Actual Behavior

Server returns HTTP 500 with the following error:

{
  "error": {
    "code": 500,
    "message": "\n------------\nWhile executing FilterExpression at line 18, column 34 in source:\n...if -%}↵            {%- if value['type'] | upper == 'STRING' -%}↵                ...\n                                           ^\nError: Unknown (built-in) filter 'upper' for type Array",
    "type": "server_error"
  }
}

Error Analysis

Error Location: Chat template, line 18, column 34

Problematic Code:

{%- if value['type'] | upper == 'STRING' -%}

Root Cause: The template attempts to apply the upper filter to value['type'], but value['type'] is an Array instead of a String. The upper filter is not defined for Array types in the template engine.

Model Metadata (from GGUF)

general.architecture = gemma4
general.name = Gemma-4-26B-A4B-It
general.quantized_by = Unsloth
tokenizer.ggml.model = gemma4
tokenizer.ggml.chat_template = {%- macro format_parameters(propertie...

Server Startup Logs (Key Sections)

main: model loaded
main: server is listening on http://0.0.0.0:8080
main: starting the main loop...
srv  update_slots: all slots are idle
srv    operator(): got exception: {"error":{"code":500,"message":"...
srv  log_server_r: done request: POST /v1/chat/completions 192.168.193.91 500

Possible Causes

  1. Model's chat template bug: The tokenizer.ggml.chat_template embedded in the GGUF file may have an error
  2. Template engine limitation: llama.cpp's chat template parser may not handle certain edge cases correctly
  3. Type mismatch: The template expects value['type'] to be a string but receives an array

Workarounds

  1. Use --no-chat-template flag to disable automatic chat templating:
    llama-server -m ... --no-chat-template ...
  2. Manually format prompts according to the model's expected format
  3. Check if a corrected GGUF version is available from the model provider (Unsloth)

Additional Notes

  • Model loads successfully without errors
  • Server starts and listens on port 8080
  • Error occurs immediately on the first chat completion request
  • Same error repeats on all subsequent requests
  • The error is consistent and reproducible

Question: Is this a bug in the model's chat template, or should llama.cpp handle this case more gracefully? Should the server fail with a clearer error message during startup instead of failing on the first request?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions