Skip to content

fix(streaming): skip model validators during partial streaming#1994

Merged
jxnl merged 4 commits into567-labs:mainfrom
thomasnormal:fix/model-validator-streaming
Jan 13, 2026
Merged

fix(streaming): skip model validators during partial streaming#1994
jxnl merged 4 commits into567-labs:mainfrom
thomasnormal:fix/model-validator-streaming

Conversation

@thomasnormal
Copy link
Copy Markdown
Contributor

@thomasnormal thomasnormal commented Jan 13, 2026

Summary

  • Fix model validators failing during partial streaming when referencing incomplete fields
  • Add PartialLiteralMixin for handling Literal/Enum types during streaming
  • Automatically wrap model validators to skip during streaming via context
  • NEW: Add final validation against original model after streaming completes to enforce required fields

Fixes #1993

Changes

Model Validator Skipping During Streaming

  • Pass context={"partial_streaming": True} during all streaming validation calls
  • Wrap @model_validator(mode="after") validators to check context and skip during streaming
  • Validators run normally during final validation (without streaming context)

Final Validation After Streaming

  • Store reference to original model in Partial model (_original_model)
  • After streaming completes, validate final object against original model
  • Enforces required fields that were made optional during streaming
  • Runs model validators without streaming context
  • If validation fails, triggers retry mechanism

PartialLiteralMixin

  • New mixin that switches partial_mode from "trailing-strings" to "on"
  • Incomplete Literal/Enum strings become None instead of failing validation
  • Documented with docstring and examples

Implementation Details

  • Uses Pydantic's mode="wrap" validator to intercept validation
  • Creates wrapper validators that check ValidationInfo.context
  • Handles multiple validators and inheritance correctly

Test plan

  • test_model_validator_skipped_during_streaming - validators skipped with streaming context
  • test_model_validator_runs_when_complete - validators run without streaming context
  • test_multiple_model_validators - multiple validators all wrapped correctly
  • test_validators_run_without_streaming_context - final validation behavior
  • test_final_validation_catches_missing_required_fields - required fields enforced
  • test_final_validation_runs_model_validators - validators run at end
  • test_streaming_yields_partial_objects_before_final_validation - streaming still works
  • All 39 partial tests pass
  • Live test with OpenAI API streaming

🤖 Generated with Claude Code

Model validators (mode="after") can fail during streaming when they
reference fields that haven't arrived yet. This commit adds automatic
wrapping of model validators to skip them during streaming.

Changes:
- Pass context={"partial_streaming": True} during streaming validation
- Wrap model validators to check context and skip during streaming
- Add PartialLiteralMixin for Literal/Enum types (uses partial_mode="on")
- Add comprehensive tests for validator behavior during streaming

The validators run normally during final validation (without streaming
context), ensuring data integrity while allowing smooth streaming.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 19ba756 in 1 minute and 51 seconds. Click for details.
  • Reviewed 751 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. instructor/dsl/partial.py:49
  • Draft comment:
    PartialLiteralMixin is currently empty and acts solely as a marker for switching partial_mode. Consider adding a comment or future extension note if custom behavior is intended.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 85% None
2. instructor/dsl/partial.py:75
  • Draft comment:
    The context {'partial_streaming': True} is passed repeatedly in multiple validator calls. Consider defining a constant or helper to DRY this pattern.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 75% vs. threshold = 85% This comment is about code that was changed in the diff - the context parameter was added in multiple places. The suggestion to extract this into a constant is a reasonable DRY refactor. It's actionable (create a constant like PARTIAL_STREAMING_CONTEXT = {"partial_streaming": True} at module level). The additional rules explicitly state "Code should be DRY (Dont Repeat Yourself)" and "Comments that suggest code quality refactors are good! But only if they are actionable and clear." This comment is both actionable and clear. However, I need to consider if this is "obvious or unimportant" - a simple dict with one key-value pair being repeated 4 times is borderline, but given the explicit DRY rule, this seems worth keeping. This is a very minor refactor for a simple dictionary that's only repeated 4 times. The value of extracting this is marginal - it's just one key-value pair, and having it inline might actually be more readable. The comment might be considered too nitpicky for such a small pattern. While it's a small refactor, the additional rules explicitly prioritize DRY principles, and this is a clear case of repetition that was introduced in this PR. The pattern appears 4 times, and if the context structure ever needs to change (e.g., adding more keys), having it as a constant would make that easier. The comment is actionable and follows the stated priorities. Keep this comment. It's a valid, actionable DRY suggestion that aligns with the explicit rules provided. The pattern is repeated 4 times in the changes, and extracting it to a constant would improve maintainability.
3. instructor/dsl/partial.py:197
  • Draft comment:
    In create_streaming_safe_validator, the looping over original validators assumes order. While Python 3.7+ maintains dict insertion order, a comment clarifying assumptions re ordering of validators might improve maintainability.
  • Reason this comment was not posted:
    Confidence changes required: 70% <= threshold 85% None
4. instructor/dsl/partial.py:236
  • Draft comment:
    The approach of setting subsequent validators to no-op to ensure parent validators are overridden is clever. Adding an inline comment to explain why no-ops are inserted (to avoid duplicate execution) may benefit future maintainers.
  • Reason this comment was not posted:
    Confidence changes required: 80% <= threshold 85% None
5. tests/dsl/test_partial.py:104
  • Draft comment:
    Debug print statements in test_partial_with_whitespace add noise. Consider removing or guarding them since tests ideally should avoid extraneous output.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_gSLJagE5U8NOMlU2

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

…ming

After streaming completes, validate the final object against the original
model to enforce required fields. This ensures:

- Required fields are validated at the end of streaming
- Model validators run without streaming context
- Incomplete responses from LLMs trigger retry mechanism

Changes:
- Store original model reference in Partial model (_original_model)
- Add final validation at the end of all streaming methods
- Update tests to provide complete data for final validation
- Add comprehensive tests for final validation behavior

Co-Authored-By: Claude Opus 4.5 <[email protected]>
thomasahle and others added 2 commits January 13, 2026 15:02
…ory fields

Fields with default_factory (e.g., List[str] = Field(default_factory=list))
were failing final validation because the partial model sets them to None,
and None is not valid for the original model's field type.

Fix: Use exclude_none=True in model_dump() during final validation so
fields with None values are excluded and the original model uses its
default values instead.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Model validators fail during partial streaming with incomplete fields

3 participants