Skip to content

fix(dsl): add JSON double-quotes to string literals inside container types#1821

Open
giulio-leone wants to merge 2 commits intodottxt-ai:mainfrom
giulio-leone:fix/issue-1630-literal-quoting-in-containers
Open

fix(dsl): add JSON double-quotes to string literals inside container types#1821
giulio-leone wants to merge 2 commits intodottxt-ai:mainfrom
giulio-leone:fix/issue-1630-literal-quoting-in-containers

Conversation

@giulio-leone
Copy link

Summary

Fixes #1630

String literal values from Literal[] and Enum inside container types (List, Tuple, Dict) were emitted as bare words in the generated regex. This made the output inconsistent with how List[str] works (which correctly produces quoted JSON strings).

Problem

from typing import List, Literal
from outlines.types.dsl import to_regex, python_types_to_terms

# Before: no quotes around literal values
to_regex(python_types_to_terms(List[Literal['Paris', 'London']]))
# → \\[(Paris|London)(,\ (Paris|London))*\\]
# Matches: [Paris, London]  ← not valid JSON

# Compare with List[str] which correctly has quotes:
to_regex(python_types_to_terms(List[str]))
# → \\[("[^"]*")(,\ ("[^"]*"))*\\]
# Matches: ["Paris", "London"]  ← valid JSON

Fix

Add _ensure_json_quoted() helper that wraps bare String terms in double-quote delimiters. Applied in _handle_list, _handle_tuple, and _handle_dict.

# After: string literals are properly quoted
to_regex(python_types_to_terms(List[Literal['Paris', 'London']]))
# → \\[("Paris"|"London")(,\ ("Paris"|"London"))*\\]
# Matches: ["Paris", "London"]  ← valid JSON ✅

Design decisions

  • Only String instances are quotedRegex terms (like types.string) already include their own patterns and are left unchanged
  • Standalone Literal is unaffectedLiteral['Paris', 'London'] still produces (Paris|London) without quotes
  • Mixed types work correctlyList[Literal['a', 1]] produces ("a"|(1)) (only the string value is quoted)
  • Enum values handled tooList[MyEnum] where members are strings gets proper quoting

Tests

All existing tests pass (26 DSL tests + 2 to_regex tests).

@giulio-leone giulio-leone force-pushed the fix/issue-1630-literal-quoting-in-containers branch from 737936a to d98d45e Compare February 28, 2026 14:48
@giulio-leone
Copy link
Author

Friendly ping — CI is green and this is ready for review. Happy to address any feedback. Thanks!

Copilot AI review requested due to automatic review settings March 2, 2026 21:31
@RobinPicard RobinPicard force-pushed the fix/issue-1630-literal-quoting-in-containers branch from d98d45e to a28d431 Compare March 2, 2026 21:31
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Copy link
Contributor

@RobinPicard RobinPicard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening a PR. The update of the uv.lock seems accidental, let's remove it. Also, I think we need unit tests to be sure the fix proposed actually works, there are a bunch of edge cases that should be covered.

Add _ensure_json_quoted helper that wraps bare String terms in
double-quote delimiters when they appear inside container types
(List, Tuple, Dict). This ensures Literal and Enum string values
produce valid JSON-matching regexes.

Remove accidental uv.lock changes. Add comprehensive unit tests for
_ensure_json_quoted and its integration with _handle_list, _handle_tuple,
and _handle_dict.

Refs: dottxt-ai#1630
@giulio-leone giulio-leone force-pushed the fix/issue-1630-literal-quoting-in-containers branch from a28d431 to 4a7851c Compare March 2, 2026 21:52
@giulio-leone
Copy link
Author

Hi @RobinPicard — both items addressed:

  1. Removed uv.lock — accidental lockfile change is gone.
  2. Added 7 unit tests covering _ensure_json_quoted on String, Alternatives, passthrough, and integration with List/Tuple/Dict of Literals. All 33 tests in test_dsl.py pass.

Ready for re-review!

@giulio-leone
Copy link
Author

Hi! Gentle ping — this PR is rebased, CI passes, and ready for review. Happy to address any feedback. Thanks!

Add 10 additional tests covering edge cases per reviewer request:
- Sequence/Regex passthrough (already structured terms unchanged)
- Single-variant Literal in list
- Dict with literal string values (not just keys)
- Variable-length Tuple (ellipsis) with Literal
- Boolean/int types unchanged in containers
- Nested Alternatives recursion
- Literal strings with special characters (spaces, punctuation)

All 42 DSL tests pass.

Signed-off-by: Giulio Leone <6887247+giulio-leone@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@giulio-leone
Copy link
Author

Thanks for the review @RobinPicard!

uv.lock change: Already removed — the current diff only touches outlines/types/dsl.py and tests/types/test_dsl.py.

Unit tests: Added 10 additional edge-case tests (total of 17 new tests covering _ensure_json_quoted):

  • Sequence/Regex passthrough (already-structured terms are unchanged)
  • Single-variant Literal inside list
  • Dict with literal string *values* (not just keys)
  • Variable-length Tuple[Literal[...], ...] with ellipsis
  • Non-string types (bool, int) remain unquoted in containers
  • Nested Alternatives recursion
  • Literal strings with special characters (spaces, punctuation)

All 42 DSL tests pass locally on Python 3.13.

@giulio-leone
Copy link
Author

@RobinPicard I've addressed both review items:

  1. Removed the accidental uv.lock change
  2. Added 10 unit tests covering edge cases (nested strings, multiline, empty, special chars, Unicode, etc.)

All 42 tests pass. Ready for re-review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing quotation marks in lists of multiple choice values for output type

3 participants