Skip to content

Conversation

@Muqi1029
Copy link

Motivation

When converting a JSON schema to EBNF, there exists an internal order dependency within the schema definition. However, since some fields are optional, it is reasonable for the LLM to generate later optional fields before earlier ones — this behavior should be independent of the definition order.

Take this schema as an example:

import json

from xgrammar import Grammar

tools = [
    {
        "type": "function",
        "function": {
            "name": "select a name",
            "parameters": {
                "type": "object",
                "properties": {"name": {"type": "string"}, "age": {"type": "integer"}},
                "additionalProperties": False,
            },
            "strict": True,
        },
    }
]
if __name__ == "__main__":
    print(" ENBF ".center(80, "="))
    ebnf = Grammar.from_json_schema(
        json.dumps(tools[0]["function"]["parameters"]), print_converted_ebnf=True, strict_mode=False
    )
===================================== ENBF =====================================
[19:41:34] /Users/admin/projects/xgrammar/cpp/grammar.cc:55: Converted EBNF: basic_escape ::= ["\\/bfnrt] | "u" [A-Fa-f0-9] [A-Fa-f0-9] [A-Fa-f0-9] [A-Fa-f0-9]
basic_string_sub ::= ("\"" | [^\0-\x1f\"\\\r\n] basic_string_sub | "\\" basic_escape basic_string_sub) (= [ \n\t]* [,}\]:])
basic_any ::= basic_number | basic_string | basic_boolean | basic_null | basic_array | basic_object
basic_integer ::= ("0" | "-"? [1-9] [0-9]*)
basic_number ::= "-"? ("0" | [1-9] [0-9]*) ("." [0-9]+)? ([eE] [+-]? [0-9]+)?
basic_string ::= ["] basic_string_sub
basic_boolean ::= "true" | "false"
basic_null ::= "null"
basic_array ::= (("[" [ \n\t]* basic_any ([ \n\t]* "," [ \n\t]* basic_any)* [ \n\t]* "]") | ("[" [ \n\t]* "]"))
basic_object ::= ("{" [ \n\t]* basic_string [ \n\t]* ":" [ \n\t]* basic_any ([ \n\t]* "," [ \n\t]* basic_string [ \n\t]* ":" [ \n\t]* basic_any)* [ \n\t]* "}") | "{" [ \n\t]* "}"
root_part_0 ::= "" | [ \n\t]* "," [ \n\t]* "\"age\"" [ \n\t]* ":" [ \n\t]* basic_integer ""
root ::= ("{" [ \n\t]* (("\"name\"" [ \n\t]* ":" [ \n\t]* basic_string root_part_0) | ("\"age\"" [ \n\t]* ":" [ \n\t]* basic_integer "")) [ \n\t]* "}") | "{" [ \n\t]* "}"

As you can see, the converted EBNF does not include an option to generate “age” before “name”!

Modification

Using the same algorithm of converting json_schema to ebnf, but in different orders using permutation.

Results

Version info:

  • sglang: main(latest)
  • xgrammar: main(latest)

Server: using SGLang to launch a server.

python -m sglang.launch_server --model-path ${QWEN3_32B_FP8} --tool-call-parser qwen --reasoning-parser qwen3 --tp-size 1 --port 8888

Client:

from argparse import ArgumentParser
from openai import OpenAI
from pprint import pprint

tool_select_name = {
    "type": "function",
    "function": {
        "name": "select_name",
        "description": "select a name",
        "additionalproperties": "false",
        "strict": "true",
        "parameters": {
            "type": "object",
            "properties": {
                "name": {
                    "type": "string",
                    "description": "the name",
                    "enum": ["Muqi", "木奇"],
                },
                "age": {
                    "type": "integer",
                    "description": "a number from 0 to 23, which represent the person's age",
                },
                "description": {
                    "type": "string",
                }
            },
        },
        "additionalproperties": False
    },
}


tools = [tool_select_name]


def main(args):
    client = OpenAI(base_url=args.base_url + "/v1", api_key=args.api_key)
    model_id = list(client.models.list())[0].id

    response = client.chat.completions.create(
        model=model_id,
        messages=[{"role": "user", "content": "Please call the tool to select a name. My name is Muqi, age is 20, description is Programmar. Strict order: Name comes first, description in the middle, and age last."}],
        tools=tools,
        tool_choice='required',
    )
    pprint(response)


if __name__ == "__main__":
    parser = ArgumentParser()
    parser.add_argument("--base-url", default="http://127.0.0.1:8000")
    parser.add_argument("--api-key", default="")
    args = parser.parse_args()
    main(args)

Before
No age since age comes before the description in the tool definition, once desciption field is generated, it is impossible to generate age later.

ChatCompletion(id='0e12602508b543189ef9220f3e5830d2', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant
', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageFunctionToolCall(id='call_bf78a35bf4004732a0875bcc', function=Function(arguments='{"name": "Muqi", "descripti
on": "Programmar”, "}', name='select_name'), type='function', index=0)], reasoning_content='Okay, the user wants me to call the select_name tool. Let me check the parameters. The name should be e
ither "Muqi" or "木奇". The user provided their name as Muqi, age 20, and description "Programmar". The strict order is name first, then description, then age.\n\nLooking at the tool\'s parameter
s, the required fields are name, age, and description. The name is in the enum, so "Muqi" is valid. The age is 20, which is between 0 and 23. The description is "Programmar", which is a string. \
n\nI need to make sure the JSON structure follows the strict order: name, description, age. Wait, the tool\'s parameters have the order name, age, description, but the user specified the order as
 name, description, age. Hmm, the tool\'s parameters might have their own order, but the user wants the output in a specific sequence. Wait, the tool\'s parameters are defined as properties, but 
the order in JSON might not matter. However, the user emphasized strict order. Maybe the user expects the JSON to be in that order. But JSON objects are unordered by default. However, some system
s might enforce order. The tool\'s strict mode is set to true, so I need to make sure all required parameters are included and in the correct order as per the user\'s instruction. \n\nWait, the t
ool\'s parameters are defined as name, age, description. But the user wants the output in the order name, description, age. Since the tool\'s strict mode is true, maybe the order in the JSON must
 match the parameters\' order. But the user\'s instruction says to follow their specified order. This is a bit conflicting. \n\nWait, the tool\'s parameters are in the order name, age, descriptio
n. But the user wants the output in name, description, age. So when constructing the arguments, should I follow the tool\'s parameter order or the user\'s specified order? Since the tool is in st
rict mode, it\'s possible that the order matters. However, JSON objects are inherently unordered, so the order might not be enforced. But the user is strict about the order. \n\nWait, the user\'s
 message says "Strict order: Name comes first, description in the middle, and age last." So the user wants the arguments in that specific order. Even though the tool\'s parameters are defined as 
name, age, description, the user\'s instruction takes precedence. Therefore, I should arrange the arguments in the order name, description, age. \n\nSo the JSON arguments would be:\n{\n  "name": 
"Muqi",\n  "description": "Programmar",\n  "age": 20\n}\n\nBut the tool\'s parameters have age as an integer between 0-23, which 20 is valid. The description is a string. So this should work. Let
 me make sure the tool allows the description to be provided even though in the parameters it\'s after age. Since JSON is unordered, the order shouldn\'t matter for the tool\'s processing, but th
e user wants the output in that order. The tool\'s strict mode ensures that all required parameters are present. \n\nTherefore, the correct tool call is to select_name with the arguments in the o
rder name, description, age.\n'), matched_stop=None)], created=1763033130, model='/root/models/Qwen3-32B-FP8/6e71f0f860155c9eb9805b83c11993e11a6d2d5e', object='chat.completion', service_tier=None
, system_fingerprint=None, usage=CompletionUsage(completion_tokens=710, prompt_tokens=232, total_tokens=942, completion_tokens_details=None, prompt_tokens_details=None, reasoning_tokens=0), metad
ata={'weight_version': 'default'})

After

ChatCompletion(id='a07b964f7c5d47018ea0aa7d52b4aa67', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant
', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageFunctionToolCall(id='call_a8132da1489b464d8d7453f5', function=Function(arguments='{"name": "Muqi", "descripti
on": "Programmar", "age": 20}', name='select_name'), type='function', index=0)], reasoning_content='Okay, let\'s see. The user wants me to call the select_name tool. The parameters required are n
ame, age, and description. The user provided their name as Muqi, age 20, and description as Programmar. Wait, "Programmar" might be a typo for "Programmer", but I should stick to what the user wr
ote.\n\nLooking at the tool\'s parameters, the name has an enum with "Muqi" and "木奇". The user specified "Muqi", so that\'s valid. The age is 20, which is between 0 and 23, so that\'s okay. The
 description is a string, so "Programmar" is acceptable even if it\'s a typo.\n\nThe strict order mentioned is Name first, description middle, age last. So in the JSON arguments, the order should
 be name, description, then age. Let me structure the JSON accordingly. Make sure the name is "Muqi", description is "Programmar", and age is 20. Check if all parameters are included and in the r
ight order. Yep, that should do it.\n'), matched_stop=None)], created=1763032880, model='/root/models/Qwen3-32B-FP8/6e71f0f860155c9eb9805b83c11993e11a6d2d5e', object='chat.completion', service_ti
er=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=265, prompt_tokens=232, total_tokens=497, completion_tokens_details=None, prompt_tokens_details=None, reasoning_tokens=0)
, metadata={'weight_version': 'default'})

Limitation

  • Using permutations likely results in full enumeration, which has O(n!) complexity. There may be better alternatives to achieve this more efficiently.
  • Some parts of the converted EBNF grammar are duplicated in this modification.

My personal take: This modification isn’t ideal, but it works fine for tools with relatively few arguments. This PR can serve as a quick fix or a reference for those who want to eliminate order dependency.

@Ubospica
Copy link
Collaborator

Ubospica commented Nov 16, 2025

@Muqi1029 Thanks for proposing this feature! It's reasonable for LLMs to provide fields in any order they want. However it could be hard to describe any order in BNF. Its corresponding grammar size would be O(n!) or O(2^n), making it hard to process the grammar for large number of fields in practice.

At the same time, since the grammar will force the LLM to generate the field name first, even though the field order is fixed, the LLM should have been aware of the field order and generate accordingly. So the impact on generation quality would be limited. Let me know if you find any significant quality degradation caused by this!

This PR could be still useful for a small number of fields and we can leave it as a good reference. Besides, there is a O(2^n) grammar construction for this. Suppose the fields are a, b, and c:

main ::= a_b_c
a_b_c ::= a b_c | b a_c | c a_b
b_c ::= b c | c b
a_c ::= a c | c a
a_b ::= a b | b a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants