feat(partial): completeness-based streaming validation by thomasnormal · Pull Request #1999 · 567-labs/instructor

thomasnormal · 2026-01-13T21:36:06Z

Summary

This PR introduces completeness-based validation for Partial streaming - a fundamental improvement over the current approach of running validation on every chunk and working around failures.

The Problem

Field constraints fail during streaming because validation runs on incomplete data:

class User(BaseModel):
    name: str = Field(min_length=5)

# During streaming:
# Chunk 1: {"name": "Al      → ValidationError: min_length=5 failed on "Al"
# Chunk 2: {"name": "Alic    → ValidationError: min_length=5 failed on "Alic"  
# Chunk 3: {"name": "Alice"} → ✓ Finally passes

This affects all field constraints: min_length, max_length, ge, le, gt, lt, pattern, multiple_of, etc.

PR #1994 fixed model validators by wrapping them with streaming context checks, but field constraints still fail.

The Solution

Only validate JSON structures that are structurally complete (closed with matching braces/brackets).

Incomplete JSON ({"name": "Al): Use model_construct() - skips ALL validation
Complete JSON ({"name": "Alice"}): Use model_validate() - full validation

This is a principled solution that handles all validation types uniformly, rather than fixing them one at a time.

Implementation

New File: `instructor/dsl/json_tracker.py`

JsonCompleteness class that analyzes JSON strings to determine which paths are complete:

tracker = JsonCompleteness()
tracker.analyze('{"user": {"name": "Alice"}, "items": [1, 2')

tracker.is_root_complete()           # False - root object not closed
tracker.is_path_complete("user")     # True - user object is closed
tracker.is_path_complete("items")    # False - array not closed

Handles edge cases: strings containing braces, escaped quotes, nested structures.

Modified: `instructor/dsl/partial.py`

process_potential_object() - Now uses completeness-based logic:

if tracker.is_root_complete() and has_data:
    return original_model.model_validate(parsed)  # Full validation
else:
    return _build_partial_object(...)  # model_construct, no validation

_build_partial_object() / _build_partial_list() - Recursively build partial objects, validating only complete nested structures
Deprecated PartialLiteralMixin - No longer needed; emits deprecation warning
Always use trailing-strings mode - Preserves incomplete data during streaming

Before/After Comparison

Validation Type	Before (main)	After (this PR)
Field constraints (`min_length`, etc.)	❌ Fails on incomplete	✅ Skipped until complete
Field validators (`@field_validator`)	❌ Fails on incomplete	✅ Skipped until complete
Model validators (`@model_validator`)	⚠️ Context-wrapped	✅ Skipped via model_construct
Literal/Enum types	⚠️ Needs PartialLiteralMixin	✅ Automatic
Required fields	⚠️ Complex workarounds	✅ Skipped until complete

Example

from pydantic import BaseModel, Field
from instructor.dsl.partial import Partial

class User(BaseModel):
    name: str = Field(min_length=5)
    email: str

PartialUser = Partial[User]

# This now works - validation skipped on incomplete JSON
chunks = ['{"name": "Al', '{"name": "Alice", "email": "[email protected]"}']
for chunk in chunks:
    for result in PartialUser.model_from_chunks([chunk]):
        print(result.name)  # "Al", then "Alice"

Test Results

All 45 tests pass, including:

TestJsonCompletenessTracker (14 tests)
TestFieldConstraintsDuringStreaming (7 tests)
TestModelValidatorsDuringStreaming (4 tests)
TestRecursiveModels (6 tests)

Breaking Changes

PartialLiteralMixin deprecated - Still works but emits warning. Can be safely removed.
Partial values now preserved - Incomplete strings like "act" are stored instead of dropped. Code checking for None to detect incomplete fields should be updated.
Nested model serialization - model_dump() now shows {"nested": {"field": None}} instead of {"nested": {}} for incomplete nested models.

Test Plan

All 45 partial tests pass
Manual verification of field constraints, model validators, nested models
Verified failure on main, success on this branch

🤖 Generated with Claude Code

Important

Introduces completeness-based validation for streaming JSON, validating only complete structures and deprecating PartialLiteralMixin.

Behavior:
- Introduces completeness-based validation for streaming JSON in partial.py, using JsonCompleteness from json_tracker.py.
- Validates only complete JSON structures, skipping validation for incomplete data.
- Deprecates PartialLiteralMixin, as completeness-based validation handles Literals and Enums automatically.
Implementation:
- Adds JsonCompleteness class in json_tracker.py to track JSON completeness.
- Updates process_potential_object() in partial.py to use completeness-based logic.
- Modifies _build_partial_object() and _build_partial_list() in partial.py to handle partial data without validation.
Tests:
- Updates tests in test_partial.py to verify completeness-based validation.
- Ensures tests cover scenarios for incomplete and complete JSON data, model validators, and recursive models.

^{This description was created by}^{for 91cc343. You can customize this summary. It will automatically update as commits are pushed.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 91cc343 in 2 minutes and 34 seconds. Click for details.

Reviewed 969 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 6 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. instructor/dsl/json_tracker.py:59

Draft comment:
Consider adding a guard (or maximum recursion depth check) in _analyze_structure to mitigate potential stack overflows on deeply nested JSON.
Reason this comment was not posted:
Confidence changes required: 70% <= threshold 85% None

2. instructor/dsl/partial.py:81

Draft comment:
The process_potential_object function cleanly distinguishes between complete and incomplete JSON using JsonCompleteness. Consider adding explicit type hints for parameters and return types to improve clarity.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 70% vs. threshold = 85% This is a code quality suggestion about adding type hints. The comment is actionable (add type hints) and clear (specifies which function and what to add). Looking at the code, the function immediately below (_build_partial_object) has full type annotations, which suggests the codebase values type hints. The comment doesn't fall into any of the prohibited categories - it's not about obvious issues, not speculative, not asking for confirmation, and it's about code that was changed (newly added). However, I need to consider if this is "obvious or unimportant" - type hints are generally considered good practice in Python, especially in library code, and this appears to be library code based on the context. The comment might be considered too minor or obvious - adding type hints is a general best practice that doesn't necessarily need to be pointed out in every review. Also, the lack of type hints might be intentional for flexibility with the **kwargs parameter, though the other parameters could still be typed. The comment doesn't specify what the types should be, which makes it slightly less actionable. While type hints are generally good practice, the comment is actionable and clear enough. The function is new code (added in this PR), and the very next function in the file has complete type annotations, showing inconsistency. For library code (which this appears to be based on the rules mentioning "library code changes"), type hints are particularly valuable for users. The comment is specific about which function needs improvement. This is a valid code quality suggestion for newly added code. It's actionable, clear, and points out an inconsistency with the rest of the file. The comment should be kept as it improves code quality in library code.

3. instructor/dsl/partial.py:307

Draft comment:
Removal of the model validator wrapping simplifies get_partial_model. Please ensure that documentation is updated to indicate that model validators now run only for structurally complete JSON (via model_validate), with incomplete data built using model_construct.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. instructor/dsl/partial.py:367

Draft comment:
In writer_model_from_chunks, the conditional that checks if a chunk starts with '{' and ends with '}' might be too strict. Consider relying on the JsonCompleteness tracker to robustly determine completeness rather than just checking string boundaries.
Reason this comment was not posted:
Comment was on unchanged code.

5. tests/dsl/test_partial.py:367

Draft comment:
Tests still reference PartialLiteralMixin despite its deprecation. It may be useful to add a comment indicating that these tests are expected to emit deprecation warnings and should be updated once the mixin is removed.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

6. tests/dsl/test_partial.py:900

Draft comment:
The test suite is comprehensive, covering various scenarios (streaming, unions, recursive models, default_factory, etc.). Consider adding inline comments in complex tests to clarify the expected behavior, especially for incremental streaming cases.
Reason this comment was not posted:
Confidence changes required: 20% <= threshold 85% None

Workflow ID: wflow_BLgtF9I1Stf2mczN

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

Introduces a fundamental improvement to how Partial streaming handles validation. Instead of running validation on every chunk and working around failures, we now only validate JSON structures that are structurally complete (closed with matching braces/brackets). Key changes: - Add JsonCompleteness tracker to detect complete JSON structures - Use model_construct() for incomplete JSON (skips all validation) - Use model_validate() only when JSON is complete - Deprecate PartialLiteralMixin (no longer needed) - Always use trailing-strings mode to preserve partial data This fixes field constraints (min_length, ge, le, pattern, etc.) that previously failed during streaming on incomplete data. Co-Authored-By: Claude Opus 4.5 <[email protected]>

ellipsis-dev Bot reviewed Jan 13, 2026

View reviewed changes

thomasnormal force-pushed the fix/completeness-based-validation branch 2 times, most recently from 2b451ab to 83ceeae Compare January 13, 2026 21:41

thomasnormal force-pushed the fix/completeness-based-validation branch from 83ceeae to b85b2f0 Compare January 13, 2026 21:56

jxnl merged commit c9563bc into 567-labs:main Jan 13, 2026
13 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(partial): completeness-based streaming validation#1999

feat(partial): completeness-based streaming validation#1999
jxnl merged 1 commit into567-labs:mainfrom
thomasnormal:fix/completeness-based-validation

thomasnormal commented Jan 13, 2026 •

edited by ellipsis-dev Bot

Loading

Uh oh!

ellipsis-dev Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

thomasnormal commented Jan 13, 2026 • edited by ellipsis-dev Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The Problem

The Solution

Implementation

New File: instructor/dsl/json_tracker.py

Modified: instructor/dsl/partial.py

Before/After Comparison

Example

Test Results

Breaking Changes

Test Plan

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thomasnormal commented Jan 13, 2026 •

edited by ellipsis-dev Bot

Loading

New File: `instructor/dsl/json_tracker.py`

Modified: `instructor/dsl/partial.py`