feat(partial): completeness-based streaming validation#1999
Merged
jxnl merged 1 commit into567-labs:mainfrom Jan 13, 2026
Merged
feat(partial): completeness-based streaming validation#1999jxnl merged 1 commit into567-labs:mainfrom
jxnl merged 1 commit into567-labs:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to 91cc343 in 2 minutes and 34 seconds. Click for details.
- Reviewed
969lines of code in3files - Skipped
0files when reviewing. - Skipped posting
6draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. instructor/dsl/json_tracker.py:59
- Draft comment:
Consider adding a guard (or maximum recursion depth check) in _analyze_structure to mitigate potential stack overflows on deeply nested JSON. - Reason this comment was not posted:
Confidence changes required:70%<= threshold85%None
2. instructor/dsl/partial.py:81
- Draft comment:
The process_potential_object function cleanly distinguishes between complete and incomplete JSON using JsonCompleteness. Consider adding explicit type hints for parameters and return types to improve clarity. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 70% vs. threshold = 85% This is a code quality suggestion about adding type hints. The comment is actionable (add type hints) and clear (specifies which function and what to add). Looking at the code, the function immediately below (_build_partial_object) has full type annotations, which suggests the codebase values type hints. The comment doesn't fall into any of the prohibited categories - it's not about obvious issues, not speculative, not asking for confirmation, and it's about code that was changed (newly added). However, I need to consider if this is "obvious or unimportant" - type hints are generally considered good practice in Python, especially in library code, and this appears to be library code based on the context. The comment might be considered too minor or obvious - adding type hints is a general best practice that doesn't necessarily need to be pointed out in every review. Also, the lack of type hints might be intentional for flexibility with the **kwargs parameter, though the other parameters could still be typed. The comment doesn't specify what the types should be, which makes it slightly less actionable. While type hints are generally good practice, the comment is actionable and clear enough. The function is new code (added in this PR), and the very next function in the file has complete type annotations, showing inconsistency. For library code (which this appears to be based on the rules mentioning "library code changes"), type hints are particularly valuable for users. The comment is specific about which function needs improvement. This is a valid code quality suggestion for newly added code. It's actionable, clear, and points out an inconsistency with the rest of the file. The comment should be kept as it improves code quality in library code.
3. instructor/dsl/partial.py:307
- Draft comment:
Removal of the model validator wrapping simplifies get_partial_model. Please ensure that documentation is updated to indicate that model validators now run only for structurally complete JSON (via model_validate), with incomplete data built using model_construct. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. instructor/dsl/partial.py:367
- Draft comment:
In writer_model_from_chunks, the conditional that checks if a chunk starts with '{' and ends with '}' might be too strict. Consider relying on the JsonCompleteness tracker to robustly determine completeness rather than just checking string boundaries. - Reason this comment was not posted:
Comment was on unchanged code.
5. tests/dsl/test_partial.py:367
- Draft comment:
Tests still reference PartialLiteralMixin despite its deprecation. It may be useful to add a comment indicating that these tests are expected to emit deprecation warnings and should be updated once the mixin is removed. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. tests/dsl/test_partial.py:900
- Draft comment:
The test suite is comprehensive, covering various scenarios (streaming, unions, recursive models, default_factory, etc.). Consider adding inline comments in complex tests to clarify the expected behavior, especially for incremental streaming cases. - Reason this comment was not posted:
Confidence changes required:20%<= threshold85%None
Workflow ID: wflow_BLgtF9I1Stf2mczN
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
2b451ab to
83ceeae
Compare
Introduces a fundamental improvement to how Partial streaming handles validation. Instead of running validation on every chunk and working around failures, we now only validate JSON structures that are structurally complete (closed with matching braces/brackets). Key changes: - Add JsonCompleteness tracker to detect complete JSON structures - Use model_construct() for incomplete JSON (skips all validation) - Use model_validate() only when JSON is complete - Deprecate PartialLiteralMixin (no longer needed) - Always use trailing-strings mode to preserve partial data This fixes field constraints (min_length, ge, le, pattern, etc.) that previously failed during streaming on incomplete data. Co-Authored-By: Claude Opus 4.5 <[email protected]>
83ceeae to
b85b2f0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces completeness-based validation for
Partialstreaming - a fundamental improvement over the current approach of running validation on every chunk and working around failures.The Problem
Field constraints fail during streaming because validation runs on incomplete data:
This affects all field constraints:
min_length,max_length,ge,le,gt,lt,pattern,multiple_of, etc.PR #1994 fixed model validators by wrapping them with streaming context checks, but field constraints still fail.
The Solution
Only validate JSON structures that are structurally complete (closed with matching braces/brackets).
{"name": "Al): Usemodel_construct()- skips ALL validation{"name": "Alice"}): Usemodel_validate()- full validationThis is a principled solution that handles all validation types uniformly, rather than fixing them one at a time.
Implementation
New File:
instructor/dsl/json_tracker.pyJsonCompletenessclass that analyzes JSON strings to determine which paths are complete:Handles edge cases: strings containing braces, escaped quotes, nested structures.
Modified:
instructor/dsl/partial.pyprocess_potential_object()- Now uses completeness-based logic:_build_partial_object()/_build_partial_list()- Recursively build partial objects, validating only complete nested structuresDeprecated
PartialLiteralMixin- No longer needed; emits deprecation warningAlways use
trailing-stringsmode - Preserves incomplete data during streamingBefore/After Comparison
min_length, etc.)@field_validator)@model_validator)Example
Test Results
All 45 tests pass, including:
TestJsonCompletenessTracker(14 tests)TestFieldConstraintsDuringStreaming(7 tests)TestModelValidatorsDuringStreaming(4 tests)TestRecursiveModels(6 tests)Breaking Changes
PartialLiteralMixindeprecated - Still works but emits warning. Can be safely removed.Partial values now preserved - Incomplete strings like
"act"are stored instead of dropped. Code checking forNoneto detect incomplete fields should be updated.Nested model serialization -
model_dump()now shows{"nested": {"field": None}}instead of{"nested": {}}for incomplete nested models.Test Plan
🤖 Generated with Claude Code
Important
Introduces completeness-based validation for streaming JSON, validating only complete structures and deprecating
PartialLiteralMixin.partial.py, usingJsonCompletenessfromjson_tracker.py.PartialLiteralMixin, as completeness-based validation handles Literals and Enums automatically.JsonCompletenessclass injson_tracker.pyto track JSON completeness.process_potential_object()inpartial.pyto use completeness-based logic._build_partial_object()and_build_partial_list()inpartial.pyto handle partial data without validation.test_partial.pyto verify completeness-based validation.This description was created by
for 91cc343. You can customize this summary. It will automatically update as commits are pushed.