Skip to content

Segmentation Fault in GrammarMatcherBase::ExpandEquivalentStackElements #250

@Saibo-creator

Description

@Saibo-creator

I'm encountering a segmentation fault (SIGSEGV) when using the grammar matcher module.
The error occurs inside:xgrammar::GrammarMatcherBase::ExpandEquivalentStackElements at grammar_matcher_base.cc:179

Reproduce

import xgrammar as xgr
from json import dumps
from transformers import AutoTokenizer, AutoConfig

device = "cuda"  
model_name = "meta-llama/Llama-3.2-1B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)

json_schema= {"$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "required": ["location", "country", "city", "parameter", "unit", "value", "date", "sourceName", "sourceType", "mobile"], "properties": {"location": {"type": "string", "minLength": 1}, "parameter": {"type": "string", "enum": ["pm25", "pm10", "no2", "so2", "o3", "co", "bc"]}, "unit": {"type": "string", "enum": ["ug/m^3", "ppm"]}, "averagingPeriod": {"type": "object", "required": ["value", "unit"], "additionalProperties": False, "properties": {"value": {"type": "number"}, "unit": {"type": "string", "enum": ["hours"]}}}, "attribution": {"type": "array", "items": {"type": "object", "required": ["name"], "additionalProperties": False, "properties": {"name": {"type": "string", "minLength": 1}, "url": {"type": "string", "pattern": "^(https?://)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([/\\w \\.-]*)*/?"}}}}, "coordinates": {"type": "object", "required": ["latitude", "longitude"], "additionalProperties": False, "properties": {"latitude": {"type": "number", "minimum": -90, "maximum": 90}, "longitude": {"type": "number", "minimum": -180, "maximum": 180}}}, "value": {"type": "number"}, "date": {"type": "object", "additionalProperties": False, "required": ["utc", "local"], "properties": {"utc": {"type": "string"}, "local": {"type": "string", "pattern": "\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}(\\+|-)\\d{2}:\\d{2}"}}}, "sourceName": {"type": "string", "minLength": 1}, "sourceType": {"type": "string", "enum": ["government", "research", "other"]}, "mobile": {"type": "boolean"}, "city": {"type": "string", "minLength": 1}, "country": {"type": "string", "maxLength": 2, "minLength": 2}}}

tokenizer_info = xgr.TokenizerInfo.from_huggingface(tokenizer, vocab_size=config.vocab_size)
grammar_compiler = xgr.GrammarCompiler(tokenizer_info, max_threads=1)
compiled_grammar = grammar_compiler.compile_json_schema(schema=dumps(json_schema))

print(compiled_grammar)

GDB Backtrace

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffa2886da2 in xgrammar::GrammarMatcherBase::ExpandEquivalentStackElements (this=this@entry=0x7fffffffbf30, cur_stack_element=..., new_stack_tops=new_stack_tops@entry=0x7fffffffbfd8, cur_stack_element_id=cur_stack_element_id@entry=-1, consider_parent=consider_parent@entry=false) at /project/cpp/grammar_matcher_base.cc:179
warning: 179    /project/cpp/grammar_matcher_base.cc: No such file or directory
#0  0x00007fffa2886da2 in xgrammar::GrammarMatcherBase::ExpandEquivalentStackElements at grammar_matcher_base.cc:179
#1  0x00007fffa2886fb0 in xgrammar::GrammarMatcherBase::ExpandEquivalentStackElements at grammar_matcher_base.cc:205
#2  0x00007fffa2887041 in xgrammar::GrammarMatcherBase::ExpandEquivalentStackElements at grammar_matcher_base.cc:247
#3  0x00007fffa2886e77 in xgrammar::GrammarMatcherBase::ExpandEquivalentStackElements at grammar_matcher_base.cc:271
...

Environemnt

  • Python | 3.12
  • xgrammar | 0.1.16

P.S. The grammar is taken from JSONSchemaBench: https://github.com/guidance-ai/jsonschemabench/blob/main/data/Github_medium/o65372.json

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions