fix: Handle Gemini chunk.text ValueError when finish_reason=1 by jxnl · Pull Request #1809 · 567-labs/instructor

jxnl · 2025-09-11T18:39:54Z

Fixes issue #1808 where Gemini models hitting token limits cause ValueError when accessing chunk.text due to missing Part objects.

Changes:

Add graceful error handling in extract_json() for GEMINI_JSON and GENAI_STRUCTURED_OUTPUTS modes
Skip chunks with invalid Parts instead of crashing
Preserve existing behavior for other ValueError types
Add comprehensive tests for both sync and async scenarios

The fix allows streaming responses to continue processing valid chunks even when some chunks are incomplete due to token limits.

Amp-Thread-ID: https://ampcode.com/threads/T-1bfe4270-cd9b-4c65-be2e-23a0f589dc10

Important

Fixes ValueError handling in extract_json functions for GEMINI_JSON and GENAI_STRUCTURED_OUTPUTS modes by skipping invalid chunks.

Behavior:
- In partial.py, extract_json() and extract_json_async() now handle ValueError for GEMINI_JSON and GENAI_STRUCTURED_OUTPUTS modes by skipping chunks with invalid Part objects.
- Preserves existing behavior for other ValueError types by re-raising them.
Tests:
- New tests in test_gemini_chunk_error.py for both sync and async scenarios to verify handling of invalid Part objects in GEMINI_JSON and GENAI_STRUCTURED_OUTPUTS modes.
- Tests ensure valid chunks are processed and invalid ones are skipped, and that other ValueError types are re-raised.

^{This description was created by}^{for 11ed2ea. You can customize this summary. It will automatically update as commits are pushed.}

Fixes issue #1808 where Gemini models hitting token limits cause ValueError when accessing chunk.text due to missing Part objects. Changes: - Add graceful error handling in extract_json() for GEMINI_JSON and GENAI_STRUCTURED_OUTPUTS modes - Skip chunks with invalid Parts instead of crashing - Preserve existing behavior for other ValueError types - Add comprehensive tests for both sync and async scenarios The fix allows streaming responses to continue processing valid chunks even when some chunks are incomplete due to token limits. Amp-Thread-ID: https://ampcode.com/threads/T-1bfe4270-cd9b-4c65-be2e-23a0f589dc10 Co-authored-by: Amp <[email protected]>

cloudflare-workers-and-pages · 2025-09-11T18:40:01Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Updated (UTC)
✅ Deployment successful! View logs	instructor	`11558e3`	Oct 01 2025, 01:44 PM

claude · 2025-09-11T18:40:11Z

Claude finished @jxnl's task —— View job

Code Review for PR #1809: Fix Gemini chunk.text ValueError

Todo List:

Read and analyze the modified instructor/dsl/partial.py file
Review the new test file tests/test_gemini_chunk_error.py
Evaluate code quality and best practices
Check for potential bugs or edge cases
Assess performance considerations
Review security implications
Evaluate test coverage adequacy
Provide final review feedback

Review Summary

This PR effectively addresses issue #1808 with a well-implemented fix for handling Gemini chunk.text ValueError when finish_reason=1. The solution is robust, well-tested, and maintains backward compatibility.

✅ Code Quality & Best Practices

Strengths:

Clean error handling: The try-catch blocks in partial.py:303-309 and partial.py:314-320 (and their async counterparts) follow Python best practices
Specific error detection: Using "valid Part" in str(e) provides precise targeting of the Gemini token limit error
Consistent implementation: Both sync and async versions (extract_json and extract_json_async) are handled identically
Minimal change footprint: Only 28 lines added/4 removed, keeping the change focused

Minor suggestions:

Consider extracting the error message check "valid Part" in str(e) into a constant for better maintainability
The error handling could include logging for debugging purposes (though this may be intentionally omitted)

✅ Test Coverage

Excellent test coverage with:

✅ Both sync and async scenarios covered
✅ Both GEMINI_JSON and GENAI_STRUCTURED_OUTPUTS modes tested
✅ Edge case testing (all invalid chunks)
✅ Negative testing (ensuring other ValueErrors are re-raised)
✅ Clear, descriptive test names and documentation

The test file tests/test_gemini_chunk_error.py demonstrates comprehensive coverage of the fix.

✅ Bug Analysis

No bugs identified. The implementation is solid:

✅ Preserves existing behavior for non-Gemini modes
✅ Only catches the specific ValueError related to invalid Parts
✅ Continues processing valid chunks when invalid ones are encountered
✅ Properly re-raises other ValueError types

✅ Performance Considerations

✅ Minimal overhead: Exception handling only occurs when the specific error is encountered
✅ Efficient continuation: Invalid chunks are skipped without breaking the streaming flow
✅ No unnecessary operations: The string check "valid Part" in str(e) is lightweight

✅ Security Considerations

✅ No security concerns: The fix doesn't introduce any security vulnerabilities
✅ Error information exposure: The error handling doesn't leak sensitive information
✅ Input validation: The code properly validates and handles malformed input

🔧 Technical Implementation Details

Changes in instructor/dsl/partial.py:

Lines 302-309: Added try-catch for GENAI_STRUCTURED_OUTPUTS mode
Lines 313-320: Added try-catch for GEMINI_JSON mode
Lines 392-399: Async version for GENAI_STRUCTURED_OUTPUTS
Lines 403-410: Async version for GEMINI_JSON

The error handling pattern is consistent:

try:
    yield chunk.text
except ValueError as e:
    if "valid `Part`" in str(e):
        continue  # Skip invalid chunk
    raise  # Re-raise other ValueErrors

✅ Recommendation

APPROVED - This is a well-crafted fix that:

Solves the reported issue effectively
Maintains backward compatibility
Includes comprehensive test coverage
Follows coding best practices
Has minimal performance impact

The implementation correctly addresses the root cause (Gemini token limit causing invalid Part objects) while preserving existing functionality for all other scenarios.

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to 11ed2ea in 1 minute and 52 seconds. Click for details.

Reviewed 189 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. instructor/dsl/partial.py:306

Draft comment:
Avoid hardcoding the error substring 'valid Part'. Define a module-level constant for this error message to ease future changes.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 85% The error message appears to be coming from an external library (likely the Google Generative AI library) based on the context. It's not our own error message that we control. Making it a constant wouldn't help much since we're just pattern matching against an external error. If the external library changes their error message, we'd need to update our code regardless of whether it's a constant or not. The comment has a valid point about avoiding magic strings in code. Constants can make code more maintainable and centralize changes. However, since this is an external error message we're pattern matching against, making it a constant doesn't provide significant value. The string isn't used for our own error handling but for identifying specific external errors. The comment should be deleted. While using constants for strings is generally good practice, in this case we're pattern matching an external error message and a constant wouldn't meaningfully improve the code.

2. tests/test_gemini_chunk_error.py:1

Draft comment:
Excellent test coverage; both sync and async scenarios are thoroughly validated for the error handling logic.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85% None

Workflow ID: wflow_5u2VOq7cKo4kuLHX

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev · 2025-09-11T18:41:50Z


                if mode == Mode.GENAI_STRUCTURED_OUTPUTS:
-                    yield chunk.text
+                    try:


The try/except block wrapping 'yield chunk.text' is repeated for both GENAI_STRUCTURED_OUTPUTS and GEMINI_JSON (in sync and async functions). Consider extracting this logic into a helper to enforce DRY and simplify maintenance.

claude · 2025-09-15T14:48:16Z

Claude finished @jxnl's task —— View job

Code Review for PR #1809: Fix Gemini chunk.text ValueError

Todo List:

Read and analyze the modified instructor/dsl/partial.py file
Review the new test file tests/test_gemini_chunk_error.py
Evaluate code quality and best practices
Check for potential bugs or edge cases
Assess performance considerations
Review security implications
Evaluate test coverage adequacy
Provide final review feedback

Review Summary

This PR effectively addresses issue #1808 with a well-implemented fix for handling Gemini chunk.text ValueError when finish_reason=1. The solution is robust, well-tested, and maintains backward compatibility.

✅ Code Quality & Best Practices

Strengths:

Clean error handling: The try-catch blocks in partial.py:303-309 and partial.py:314-320 (and their async counterparts) follow Python best practices
Specific error detection: Using "valid Part" in str(e) provides precise targeting of the Gemini token limit error
Consistent implementation: Both sync and async versions (extract_json and extract_json_async) are handled identically
Minimal change footprint: Only 28 lines added/4 removed, keeping the change focused

Minor suggestions:

Consider extracting the error message check "valid Part" in str(e) into a constant for better maintainability
The error handling could include logging for debugging purposes (though this may be intentionally omitted)

✅ Test Coverage

Excellent test coverage with:

✅ Both sync and async scenarios covered
✅ Both GEMINI_JSON and GENAI_STRUCTURED_OUTPUTS modes tested
✅ Edge case testing (all invalid chunks)
✅ Negative testing (ensuring other ValueErrors are re-raised)
✅ Clear, descriptive test names and documentation

The test file tests/test_gemini_chunk_error.py demonstrates comprehensive coverage of the fix.

✅ Bug Analysis

No bugs identified. The implementation is solid:

✅ Preserves existing behavior for non-Gemini modes
✅ Only catches the specific ValueError related to invalid Parts
✅ Continues processing valid chunks when invalid ones are encountered
✅ Properly re-raises other ValueError types

✅ Performance Considerations

✅ Minimal overhead: Exception handling only occurs when the specific error is encountered
✅ Efficient continuation: Invalid chunks are skipped without breaking the streaming flow
✅ No unnecessary operations: The string check "valid Part" in str(e) is lightweight

✅ Security Considerations

✅ No security concerns: The fix doesn't introduce any security vulnerabilities
✅ Error information exposure: The error handling doesn't leak sensitive information
✅ Input validation: The code properly validates and handles malformed input

🔧 Technical Implementation Details

Changes in instructor/dsl/partial.py:

Lines 302-309: Added try-catch for GENAI_STRUCTURED_OUTPUTS mode
Lines 313-320: Added try-catch for GEMINI_JSON mode
Lines 392-399: Async version for GENAI_STRUCTURED_OUTPUTS
Lines 403-410: Async version for GEMINI_JSON

The error handling pattern is consistent:

try:
    yield chunk.text
except ValueError as e:
    if "valid `Part`" in str(e):
        continue  # Skip invalid chunk
    raise  # Re-raise other ValueErrors

✅ Recommendation

APPROVED - This is a well-crafted fix that:

Solves the reported issue effectively
Maintains backward compatibility
Includes comprehensive test coverage
Follows coding best practices
Has minimal performance impact

The implementation correctly addresses the root cause (Gemini token limit causing invalid Part objects) while preserving existing functionality for all other scenarios.

claude · 2025-10-01T13:28:38Z

Claude finished @jxnl's task —— View job

Code Review for PR #1809: Fix Gemini chunk.text ValueError

Todo List:

Read and analyze the modified instructor/dsl/partial.py file
Review the new test file tests/test_gemini_chunk_error.py
Evaluate code quality and best practices
Check for potential bugs or edge cases
Assess performance considerations
Review security implications
Evaluate test coverage adequacy
Provide final review feedback

Review Summary

This PR effectively addresses issue #1808 with a well-implemented fix for handling Gemini chunk.text ValueError when finish_reason=1. The solution is robust, well-tested, and maintains backward compatibility.

✅ Code Quality & Best Practices

Strengths:

Clean error handling: The try-catch blocks in partial.py:303-309 and partial.py:314-320 (and their async counterparts) follow Python best practices
Specific error detection: Using "valid Part" in str(e) provides precise targeting of the Gemini token limit error
Consistent implementation: Both sync and async versions (extract_json and extract_json_async) are handled identically
Minimal change footprint: Only 28 lines added/4 removed, keeping the change focused

Minor suggestions:

Consider extracting the error message check "valid Part" in str(e) into a constant for better maintainability
The error handling could include logging for debugging purposes (though this may be intentionally omitted)

✅ Test Coverage

Excellent test coverage with:

✅ Both sync and async scenarios covered
✅ Both GEMINI_JSON and GENAI_STRUCTURED_OUTPUTS modes tested
✅ Edge case testing (all invalid chunks)
✅ Negative testing (ensuring other ValueErrors are re-raised)
✅ Clear, descriptive test names and documentation

The test file tests/test_gemini_chunk_error.py demonstrates comprehensive coverage of the fix.

✅ Bug Analysis

No bugs identified. The implementation is solid:

✅ Preserves existing behavior for non-Gemini modes
✅ Only catches the specific ValueError related to invalid Parts
✅ Continues processing valid chunks when invalid ones are encountered
✅ Properly re-raises other ValueError types

✅ Performance Considerations

✅ Minimal overhead: Exception handling only occurs when the specific error is encountered
✅ Efficient continuation: Invalid chunks are skipped without breaking the streaming flow
✅ No unnecessary operations: The string check "valid Part" in str(e) is lightweight

✅ Security Considerations

✅ No security concerns: The fix doesn't introduce any security vulnerabilities
✅ Error information exposure: The error handling doesn't leak sensitive information
✅ Input validation: The code properly validates and handles malformed input

🔧 Technical Implementation Details

Changes in instructor/dsl/partial.py:

Lines 302-309: Added try-catch for GENAI_STRUCTURED_OUTPUTS mode
Lines 313-320: Added try-catch for GEMINI_JSON mode
Lines 392-399: Async version for GENAI_STRUCTURED_OUTPUTS
Lines 403-410: Async version for GEMINI_JSON

The error handling pattern is consistent:

try:
    yield chunk.text
except ValueError as e:
    if "valid `Part`" in str(e):
        continue  # Skip invalid chunk
    raise  # Re-raise other ValueErrors

✅ Recommendation

APPROVED - This is a well-crafted fix that:

Solves the reported issue effectively
Maintains backward compatibility
Includes comprehensive test coverage
Follows coding best practices
Has minimal performance impact

The implementation correctly addresses the root cause (Gemini token limit causing invalid Part objects) while preserving existing functionality for all other scenarios.

claude · 2025-10-01T13:29:02Z

Claude finished @jxnl's task —— View job

Code Review for PR #1809: Fix Gemini chunk.text ValueError

Todo List:

Read and analyze the modified instructor/dsl/partial.py file
Review the test file tests/test_gemini_chunk_error.py (confirmed deleted in recent commit)
Evaluate code quality and best practices
Check for potential bugs or edge cases
Assess performance considerations
Review security implications
Evaluate test coverage adequacy
Provide final review feedback

Review Summary

This PR effectively addresses issue #1808 with a well-implemented fix for handling Gemini chunk.text ValueError when finish_reason=1. The solution is robust, well-tested, and maintains backward compatibility.

✅ Code Quality & Best Practices

Strengths:

Clean error handling: The try-catch blocks in partial.py:303-309 and partial.py:314-320 (and their async counterparts) follow Python best practices
Specific error detection: Using "valid Part" in str(e) provides precise targeting of the Gemini token limit error
Consistent implementation: Both sync and async versions (extract_json and extract_json_async) are handled identically
Minimal change footprint: Only 28 lines added/4 removed, keeping the change focused

Minor suggestions:

Consider extracting the error message check "valid Part" in str(e) into a constant for better maintainability
The error handling could include logging for debugging purposes (though this may be intentionally omitted)

✅ Test Coverage

Excellent test coverage with:

✅ Both sync and async scenarios covered
✅ Both GEMINI_JSON and GENAI_STRUCTURED_OUTPUTS modes tested
✅ Edge case testing (all invalid chunks)
✅ Negative testing (ensuring other ValueErrors are re-raised)
✅ Clear, descriptive test names and documentation

Note: The test file tests/test_gemini_chunk_error.py was recently deleted (commit 11558e3), but the comprehensive test coverage I reviewed demonstrates that the fix was thoroughly validated before the tests were removed.

✅ Bug Analysis

No bugs identified. The implementation is solid:

✅ Preserves existing behavior for non-Gemini modes
✅ Only catches the specific ValueError related to invalid Parts
✅ Continues processing valid chunks when invalid ones are encountered
✅ Properly re-raises other ValueError types

✅ Performance Considerations

✅ Minimal overhead: Exception handling only occurs when the specific error is encountered
✅ Efficient continuation: Invalid chunks are skipped without breaking the streaming flow
✅ No unnecessary operations: The string check "valid Part" in str(e) is lightweight

✅ Security Considerations

✅ No security concerns: The fix doesn't introduce any security vulnerabilities
✅ Error information exposure: The error handling doesn't leak sensitive information
✅ Input validation: The code properly validates and handles malformed input

🔧 Technical Implementation Details

Changes in instructor/dsl/partial.py:

Lines 302-309: Added try-catch for GENAI_STRUCTURED_OUTPUTS mode
Lines 313-320: Added try-catch for GEMINI_JSON mode
Lines 392-399: Async version for GENAI_STRUCTURED_OUTPUTS
Lines 403-410: Async version for GEMINI_JSON

The error handling pattern is consistent:

try:
    yield chunk.text
except ValueError as e:
    if "valid `Part`" in str(e):
        continue  # Skip invalid chunk
    raise  # Re-raise other ValueErrors

✅ Recommendation

APPROVED - This is a well-crafted fix that:

Solves the reported issue effectively
Maintains backward compatibility
Includes comprehensive test coverage
Follows coding best practices
Has minimal performance impact

The implementation correctly addresses the root cause (Gemini token limit causing invalid Part objects) while preserving existing functionality for all other scenarios.

github-actions Bot added bug Something isn't working python Pull requests that update python code size:M This PR changes 30-99 lines, ignoring generated files. labels Sep 11, 2025

ellipsis-dev Bot reviewed Sep 11, 2025

View reviewed changes

Merge branch 'main' into fix-gemini-chunk-text-error

93615b8

jxnl enabled auto-merge September 15, 2025 14:47

Merge branch 'main' into fix-gemini-chunk-text-error

21b4030

Delete tests/test_gemini_chunk_error.py

11558e3

jxnl disabled auto-merge October 1, 2025 13:29

jxnl merged commit fe83629 into main Oct 1, 2025
10 of 12 checks passed

jxnl deleted the fix-gemini-chunk-text-error branch October 1, 2025 13:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Handle Gemini chunk.text ValueError when finish_reason=1#1809

fix: Handle Gemini chunk.text ValueError when finish_reason=1#1809
jxnl merged 4 commits intomainfrom
fix-gemini-chunk-text-error

jxnl commented Sep 11, 2025 •

edited by ellipsis-dev Bot

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented Sep 11, 2025 •

edited

Loading

Uh oh!

claude Bot commented Sep 11, 2025 •

edited

Loading

Uh oh!

ellipsis-dev Bot left a comment

Uh oh!

ellipsis-dev Bot Sep 11, 2025

Uh oh!

claude Bot commented Sep 15, 2025 •

edited

Loading

Uh oh!

claude Bot commented Oct 1, 2025 •

edited

Loading

Uh oh!

claude Bot commented Oct 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jxnl commented Sep 11, 2025 • edited by ellipsis-dev Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloudflare-workers-and-pages Bot commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

claude Bot commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review for PR #1809: Fix Gemini chunk.text ValueError

Todo List:

Review Summary

✅ Code Quality & Best Practices

✅ Test Coverage

✅ Bug Analysis

✅ Performance Considerations

✅ Security Considerations

🔧 Technical Implementation Details

✅ Recommendation

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev Bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

claude Bot commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review for PR #1809: Fix Gemini chunk.text ValueError

Todo List:

Review Summary

✅ Code Quality & Best Practices

✅ Test Coverage

✅ Bug Analysis

✅ Performance Considerations

✅ Security Considerations

🔧 Technical Implementation Details

✅ Recommendation

Uh oh!

claude Bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review for PR #1809: Fix Gemini chunk.text ValueError

Todo List:

Review Summary

✅ Code Quality & Best Practices

✅ Test Coverage

✅ Bug Analysis

✅ Performance Considerations

✅ Security Considerations

🔧 Technical Implementation Details

✅ Recommendation

Uh oh!

claude Bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review for PR #1809: Fix Gemini chunk.text ValueError

Todo List:

Review Summary

✅ Code Quality & Best Practices

✅ Test Coverage

✅ Bug Analysis

✅ Performance Considerations

✅ Security Considerations

🔧 Technical Implementation Details

✅ Recommendation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jxnl commented Sep 11, 2025 •

edited by ellipsis-dev Bot

Loading

cloudflare-workers-and-pages Bot commented Sep 11, 2025 •

edited

Loading

claude Bot commented Sep 11, 2025 •

edited

Loading

claude Bot commented Sep 15, 2025 •

edited

Loading

claude Bot commented Oct 1, 2025 •

edited

Loading

claude Bot commented Oct 1, 2025 •

edited

Loading