-
Notifications
You must be signed in to change notification settings - Fork 102
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
I'm encountering a segmentation fault (SIGSEGV) when using the grammar matcher module.
The error occurs inside:xgrammar::GrammarMatcherBase::ExpandEquivalentStackElements at grammar_matcher_base.cc:179
Reproduce
import xgrammar as xgr
from json import dumps
from transformers import AutoTokenizer, AutoConfig
device = "cuda"
model_name = "meta-llama/Llama-3.2-1B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
json_schema= {"$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "required": ["location", "country", "city", "parameter", "unit", "value", "date", "sourceName", "sourceType", "mobile"], "properties": {"location": {"type": "string", "minLength": 1}, "parameter": {"type": "string", "enum": ["pm25", "pm10", "no2", "so2", "o3", "co", "bc"]}, "unit": {"type": "string", "enum": ["ug/m^3", "ppm"]}, "averagingPeriod": {"type": "object", "required": ["value", "unit"], "additionalProperties": False, "properties": {"value": {"type": "number"}, "unit": {"type": "string", "enum": ["hours"]}}}, "attribution": {"type": "array", "items": {"type": "object", "required": ["name"], "additionalProperties": False, "properties": {"name": {"type": "string", "minLength": 1}, "url": {"type": "string", "pattern": "^(https?://)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([/\\w \\.-]*)*/?"}}}}, "coordinates": {"type": "object", "required": ["latitude", "longitude"], "additionalProperties": False, "properties": {"latitude": {"type": "number", "minimum": -90, "maximum": 90}, "longitude": {"type": "number", "minimum": -180, "maximum": 180}}}, "value": {"type": "number"}, "date": {"type": "object", "additionalProperties": False, "required": ["utc", "local"], "properties": {"utc": {"type": "string"}, "local": {"type": "string", "pattern": "\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}(\\+|-)\\d{2}:\\d{2}"}}}, "sourceName": {"type": "string", "minLength": 1}, "sourceType": {"type": "string", "enum": ["government", "research", "other"]}, "mobile": {"type": "boolean"}, "city": {"type": "string", "minLength": 1}, "country": {"type": "string", "maxLength": 2, "minLength": 2}}}
tokenizer_info = xgr.TokenizerInfo.from_huggingface(tokenizer, vocab_size=config.vocab_size)
grammar_compiler = xgr.GrammarCompiler(tokenizer_info, max_threads=1)
compiled_grammar = grammar_compiler.compile_json_schema(schema=dumps(json_schema))
print(compiled_grammar)GDB Backtrace
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffa2886da2 in xgrammar::GrammarMatcherBase::ExpandEquivalentStackElements (this=this@entry=0x7fffffffbf30, cur_stack_element=..., new_stack_tops=new_stack_tops@entry=0x7fffffffbfd8, cur_stack_element_id=cur_stack_element_id@entry=-1, consider_parent=consider_parent@entry=false) at /project/cpp/grammar_matcher_base.cc:179
warning: 179 /project/cpp/grammar_matcher_base.cc: No such file or directory
#0 0x00007fffa2886da2 in xgrammar::GrammarMatcherBase::ExpandEquivalentStackElements at grammar_matcher_base.cc:179
#1 0x00007fffa2886fb0 in xgrammar::GrammarMatcherBase::ExpandEquivalentStackElements at grammar_matcher_base.cc:205
#2 0x00007fffa2887041 in xgrammar::GrammarMatcherBase::ExpandEquivalentStackElements at grammar_matcher_base.cc:247
#3 0x00007fffa2886e77 in xgrammar::GrammarMatcherBase::ExpandEquivalentStackElements at grammar_matcher_base.cc:271
...
Environemnt
- Python | 3.12
- xgrammar | 0.1.16
P.S. The grammar is taken from JSONSchemaBench: https://github.com/guidance-ai/jsonschemabench/blob/main/data/Github_medium/o65372.json
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working