json-schema-to-grammar improvements (+ added to server)#5978
Merged
ochafik merged 106 commits intoggml-org:masterfrom Mar 21, 2024
Merged
json-schema-to-grammar improvements (+ added to server)#5978ochafik merged 106 commits intoggml-org:masterfrom
ochafik merged 106 commits intoggml-org:masterfrom
Conversation
added 30 commits
March 1, 2024 14:11
{"type": "string", "pattern": "^({\"question\": \"[^\"]+\", \"response\": \"[^\"]+\"}\\n)+$"}
added 7 commits
March 20, 2024 14:35
ggerganov
approved these changes
Mar 21, 2024
Member
ggerganov
left a comment
There was a problem hiding this comment.
Squash merge when you are ready
I haven't tested the implementation, but it seems you've put enough effort and there are unit tests, so it should be good 👍
Comment on lines
+32
to
+35
| std::cerr << "#" << std::endl; | ||
| std::cerr << "# Test '" << name.c_str() << "' failed." << std::endl; | ||
| std::cerr << "#" << std::endl; | ||
| std::cerr << schema.c_str() << std::endl; |
Member
There was a problem hiding this comment.
For consistency with the rest of the codebase:
std::cerr -> fprintf(stderr,
std::cout -> fprintf(stdout,
common/json-schema-to-grammar.cpp
Outdated
| return result; | ||
| } | ||
|
|
||
| static std::string replacePattern(const std::string& input, const std::regex& regex, const std::function<std::string(const std::smatch &)>& replacement) { |
Member
There was a problem hiding this comment.
For consistency, put spaces before and after - see the rest of the usages in the codebase:
Suggested change
| static std::string replacePattern(const std::string& input, const std::regex& regex, const std::function<std::string(const std::smatch &)>& replacement) { | |
| static std::string replacePattern(const std::string & input, const std::regex & regex, const std::function<std::string(const std::smatch &)> & replacement) { |
ochafik
pushed a commit
to ochafik/llama.cpp
that referenced
this pull request
Mar 21, 2024
…s are present Tests introduced in ggml-org#5978 disabled in ggml-org#6198
Closed
4 tasks
Closed
15 tasks
hodlen
pushed a commit
to hodlen/llama.cpp
that referenced
this pull request
Apr 3, 2024
* json: fix arrays (disallow `[,1]`)
* json: support tuple types (`[number, string]`)
* json: support additionalProperties (`{[k: string]: [string,number][]}`)
* json: support required / optional properties
* json: add support for pattern
* json: resolve $ref (and support https schema urls)
* json: fix $ref resolution
* join: support union types (mostly for nullable types I think)
* json: support allOf + nested anyOf
* json: support any (`{}` or `{type: object}`)
* json: fix merge
* json: temp fix for escapes
* json: spaces in output and unrestricted output spaces
* json: add typings
* json:fix typo
* Create ts-type-to-grammar.sh
* json: fix _format_literal (json.dumps already escapes quotes)
* json: merge lit sequences and handle negatives
{"type": "string", "pattern": "^({\"question\": \"[^\"]+\", \"response\": \"[^\"]+\"}\\n)+$"}
* json: handle pattern repetitions
* Update json-schema-to-grammar.mjs
* Create regex-to-grammar.py
* json: extract repeated regexp patterns to subrule
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* json: handle schema from pydantic Optional fields
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* Update ts-type-to-grammar.sh
* Update ts-type-to-grammar.sh
* json: simplify nullable fields handling
* json: accept duplicate identical rules
* json: revert space to 1 at most
* json: reuse regexp pattern subrules
* json: handle uuid string format
* json: fix literal escapes
* json: add --allow-fetch
* json: simplify range escapes
* json: support negative ranges in patterns
* Delete commit.txt
* json: custom regex parser, adds dot support & JS-portable
* json: rm trailing spaces
* Update json-schema-to-grammar.mjs
* json: updated server & chat `( cd examples/server && ./deps.sh )`
* json: port fixes from mjs to python
* Update ts-type-to-grammar.sh
* json: support prefixItems alongside array items
* json: add date format + fix uuid
* json: add date, time, date-time formats
* json: preserve order of props from TS defs
* json: port schema converter to C++, wire in ./server
* json: nits
* Update json-schema-to-grammar.cpp
* Update json-schema-to-grammar.cpp
* Update json-schema-to-grammar.cpp
* json: fix mjs implementation + align outputs
* Update json-schema-to-grammar.mjs.hpp
* json: test C++, JS & Python versions
* json: nits + regen deps
* json: cleanup test
* json: revert from c++17 to 11
* json: nit fixes
* json: dirty include for test
* json: fix zig build
* json: pass static command to std::system in tests (fixed temp files)
* json: fix top-level $refs
* json: don't use c++20 designated initializers
* nit
* json: basic support for reserved names `{number:{number:{root:number}}}`
* Revamp test cmake to allow args (WORKING_DIRECTORY needed for JSON test)
* json: re-ran server deps.sh
* json: simplify test
* json: support mix of additional props & required/optional
* json: add tests for some expected failures
* json: fix type=const in c++, add failure expectations for non-str const&enum
* json: test (& simplify output of) empty schema
* json: check parsing in test + fix value & string refs
* json: add server tests for OAI JSON response_format
* json: test/fix top-level anyOf
* json: improve grammar parsing failures
* json: test/fix additional props corner cases
* json: fix string patterns (was missing quotes)
* json: ws nit
* json: fix json handling in server when there's no response_format
* json: catch schema conversion errors in server
* json: don't complain about unknown format type in server if unset
* json: cleaner build of test
* json: create examples/json-schema-pydantic-example.py
* json: fix date pattern
* json: move json.hpp & json-schema-to-grammar.{cpp,h} to common
* json: indent 4 spaces
* json: fix naming of top-level c++ function (+ drop unused one)
* json: avoid using namespace std
* json: fix zig build
* Update server.feature
* json: iostream -> fprintf
* json: space before & refs for consistency
* json: nits
hodlen
pushed a commit
to hodlen/llama.cpp
that referenced
this pull request
Apr 3, 2024
* json: only attempt python & node schema conversion tests if their bins are present Tests introduced in ggml-org#5978 disabled in ggml-org#6198 * json: orange warnings when tests skipped * json: ensure py/js schema conv tested on ubuntu-focal-make * json: print env vars in test
This was referenced Apr 8, 2024
Merged
5 tasks
This was referenced Jun 23, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Improved JSON schema → GBNF grammar conversion support
required,allOf,anyOfitemssyntax)$ref(in same schema, or over https if--allow-fetchis set), allowing recursive typesadditionalPropertiespattern(most features, no greediness modifiers nor lookaheads; dot.can be made to match line breaks with--dotallflag)date,time,date-time,uuidstring formatsserver: the API now handles"response_format": {"type": "json_object", "schema": ...}on its own (see examples below), bridging an important gap between the C++ server and llama-cpp-python$ref(deemed too risky in server){type: "string", pattern: "..."}as a raw pattern (used byexamples/regex-to-grammar.py, see example below)[, 1]--prop-orderflag)As a result, it can now consume the JSON schemas produced by Pydantic (used by ~ all Python LLM frameworks), typescript-json-schema and its more recent fork ts-json-schema-generator, along with some more advanced schemas (tsconfig.json is the toughest I tested).
Hopefully this PR hasn't grown too big, happy to send it in chunk if needed.
Examples
(outputs below are with Nous-Hermes-2-Mixtruct-v0.1-8x7B-DPO-DARE_TIES-Q6_K)
tsconfig.jsonShow output
Pydantic w/ recursive types
Show output
Regular expressions
Show output
TypeScript types
Show output
Typescript type including
Date:Show output
Other special formats:
Show output
JSON schema in server API (using instructor without
llama-cpp-python):Show output
TODOs (before undrafting this PR)
.mjsagain and test server + chat.mjs (had to implement a custom regexp parser instead of using Python's builtin one, to allow porting to JS)Fix sanitizer testsPossible followups:
const&enum→ json-schema-to-grammar: fix order of props in C++, support non-string const/enum #6232examples/pydantic-models-to-grammar-examples.pygbnfPython package (similar to gguf) sollama-cpp-pythonneedn't copy / inline this script