feat(retry): add comprehensive tracking of all failed attempts and exceptions by jxnl · Pull Request #1802 · 567-labs/instructor

jxnl · 2025-09-04T19:41:00Z

Description

This PR enhances the retry mechanism in Instructor to track all failed completions and exceptions across the entire range of retries, not just the final failure. This provides much better debugging capabilities and insight into retry patterns.

Changes

Core Changes

New data structure: Tracks individual retry failures with attempt number, exception, and completion response
**Enhanced **: Now includes a list containing all retry failures
Updated retry functions: Both and now collect comprehensive failure information

Key Features

Track attempt number for each failure
Store the actual exception that occurred
Preserve completion responses (when available) for analysis
Maintain backward compatibility with existing exception handling

Benefits

Better debugging: See exactly what failed at each retry attempt
Pattern analysis: Identify if failures are consistent or changing across retries
Completion inspection: Access raw LLM responses that failed validation
Comprehensive error reporting: Full visibility into the retry process

Usage Example

try:
    response = client.chat.completions.create(
        response_model=MyModel,
        messages=messages,
        max_retries=3
    )
except InstructorRetryException as e:
    print(f"Failed after {e.n_attempts} attempts")
    
    # New: Access all failed attempts
    for attempt in e.failed_attempts:
        print(f"Attempt {attempt.attempt_number}: {attempt.exception}")
        if attempt.completion:
            # Analyze the raw completion that failed
            analyze_completion(attempt.completion)

Testing

All existing tests pass (backward compatibility maintained)
Linting and formatting checks pass
Example demonstrates the new functionality

Backward Compatibility

This change is fully backward compatible. The new field is optional and defaults to an empty list if not provided.

This PR was written by Cursor

Important

Enhances retry mechanism to track all failed attempts and exceptions for improved debugging and analysis.

Behavior:
- Introduces FailedAttempt in exceptions.py to track retry attempts with attempt number, exception, and completion.
- Updates InstructorRetryException to include failed_attempts list.
- Modifies retry_sync and retry_async in retry.py to populate failed_attempts with each failed attempt.
Benefits:
- Provides detailed tracking of all retry attempts for better debugging and analysis.
- Maintains backward compatibility by defaulting failed_attempts to an empty list if not provided.

^{This description was created by}^{for 69f2017. You can customize this summary. It will automatically update as commits are pushed.}

…ceptions - Add FailedAttempt data structure to track individual retry failures - Extend InstructorRetryException to include failed_attempts list - Modify retry_sync and retry_async to collect all failures across retries - Track attempt number, exception, and completion for each failed retry - Enables detailed analysis of retry patterns and failure modes

cloudflare-workers-and-pages · 2025-09-04T19:41:11Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	instructor	`69f2017`	Commit Preview URL Branch Preview URL	Sep 04 2025, 07:53 PM

claude · 2025-09-04T19:41:36Z

Claude finished @jxnl's task —— View job

Code Review Complete ✅

Reviewing the retry tracking enhancement PR. Here's my analysis checklist:

Read and analyze modified files
Review code quality and best practices
Check for potential bugs or issues
Evaluate performance considerations
Assess security concerns
Analyze test coverage
Provide comprehensive feedback

Code Review Results

This PR adds comprehensive retry tracking functionality to the Instructor library. Here's my detailed analysis:

✅ Code Quality & Best Practices

Positive aspects:

Clean, well-structured implementation following existing patterns
Good use of type hints including NamedTuple for FailedAttempt
Proper integration with existing error handling flow
Consistent code style matching the codebase
Clear docstring documentation

Areas for improvement:

The FailedAttempt.completion: Any | None = None could be more type-specific if possible
Consider adding a docstring to the FailedAttempt class explaining the purpose of each field

🐛 Potential Issues

Minor concerns:

Memory accumulation: In scenarios with many retries, the failed_attempts list could accumulate significant memory, especially if completions contain large responses. Consider documenting this behavior or adding optional size limits.
Duplicate code: The retry attempt tracking logic is nearly identical between retry_sync() and retry_async() functions (lines 207-213 and 247-253 in sync, 362-368 and 402-408 in async). Consider extracting this into a helper function to reduce duplication.

⚡ Performance Considerations

Good performance practices:

Minimal overhead - only adds list appends during failures
No impact on successful requests
Pre-allocates empty list avoiding repeated list reallocations

Potential optimization:

The failed_attempts list is always initialized even when retries may not be needed. This is acceptable overhead given the simplicity.

🔒 Security Concerns

No security issues identified:

The implementation doesn't expose sensitive data
Failed attempts are only stored in memory and not logged automatically
The completion data access is controlled by the calling code

🧪 Test Coverage

Concerns identified:

Missing test coverage for new functionality: The existing test files don't appear to test the new failed_attempts field
The test_exceptions.py doesn't test FailedAttempt at all
No tests verify that all failed attempts are properly tracked
No tests verify the attempt numbering is correct
No tests ensure backward compatibility of the new optional parameter

Recommended test additions:

# Test that failed_attempts are populated correctly
def test_instructor_retry_exception_with_failed_attempts():
    from instructor.core.exceptions import FailedAttempt
    
    attempt1 = FailedAttempt(1, ValueError("first error"), None)
    attempt2 = FailedAttempt(2, ValueError("second error"), {"some": "completion"})
    
    with pytest.raises(InstructorRetryException) as exc_info:
        raise InstructorRetryException(
            n_attempts=2,
            total_usage=100,
            failed_attempts=[attempt1, attempt2]
        )
    
    exception = exc_info.value
    assert len(exception.failed_attempts) == 2
    assert exception.failed_attempts[0].attempt_number == 1
    assert exception.failed_attempts[1].completion == {"some": "completion"}

# Test backward compatibility
def test_instructor_retry_exception_backward_compatibility():
    # Should work without failed_attempts parameter
    with pytest.raises(InstructorRetryException) as exc_info:
        raise InstructorRetryException(n_attempts=1, total_usage=50)
    
    assert exc_info.value.failed_attempts == []

📋 Additional Observations

Positive implementation details:

Backward compatibility is properly maintained with failed_attempts or []
The attempt numbers align with tenacity's retry state
Both validation errors and completion errors are tracked appropriately
Integration with hooks system is preserved

Minor suggestions:

Consider adding a class method to FailedAttempt for creating instances without completion data
The exception message could potentially include a count of failed attempts for better UX

Final Verdict

This is a solid implementation that adds valuable debugging capabilities without breaking existing functionality. The main concerns are:

Missing test coverage for the new functionality (high priority)
Code duplication between sync/async functions (medium priority)
Memory considerations for high-retry scenarios (low priority - document only)

The PR is functionally correct and ready for use, but I recommend adding comprehensive tests before merging to ensure the new functionality works as expected and maintains backward compatibility.

claude · 2025-09-04T19:41:52Z

Claude finished @jxnl's task —— View job

PR Review Complete ✅

Reviewing PR #1802: Enhanced retry mechanism with comprehensive failure tracking

Todo List:

Read and analyze changed files
Review code quality and best practices
Check for potential bugs or issues
Assess performance considerations
Identify security concerns
Evaluate test coverage
Provide comprehensive feedback

Code Review Feedback

✅ Strengths

Well-structured design: The FailedAttempt NamedTuple is clean and focused, capturing the essential information for each retry attempt.
Backward compatibility: The implementation properly defaults failed_attempts to an empty list, maintaining compatibility with existing code.
Consistent implementation: Both sync (retry_sync) and async (retry_async) functions are updated identically, maintaining API consistency.
Good typing: Proper type hints are used throughout, including list[FailedAttempt] and union types.
Comprehensive tracking: The implementation captures both validation errors and general exceptions, providing complete visibility into retry failures.

🔍 Code Quality & Best Practices

Good practices observed:

Clean separation of concerns with the FailedAttempt data structure
Consistent error handling patterns between sync/async variants
Proper logging and debug information maintained
Type annotations are comprehensive and accurate

Areas for consideration:

The logic for determining "last attempt" (lines 216-233 and 371-387) is complex and duplicated. Consider extracting this into a helper function.

⚠️ Potential Issues

Memory consumption:
- The failed_attempts list will grow with each retry attempt
- For high retry counts or large completion objects, this could consume significant memory
- Consider: Should there be a limit on tracked attempts or size-based truncation?
Completion object storage:
- completion can be Any | None, and storing full completion objects might be memory-intensive
- The completion object could contain large amounts of text or embeddings
- Consider: Could you store only essential debugging information instead of the full completion?
Exception chaining:
- Line 276 and 431: e.last_attempt._exception - accessing private _exception attribute
- This could be fragile if tenacity changes its internal implementation
- Consider: Use tenacity's public API or add defensive code

🏃♂️ Performance Considerations

List operations: Using list.append() for tracking attempts is O(1) and efficient.
Memory impact: Each FailedAttempt object stores:
- An integer (attempt number)
- An exception object
- Potentially large completion response
This could accumulate significant memory for operations with many retries.
Serialization concerns: If these exceptions are logged or serialized, the completion objects could cause performance issues.

🔒 Security Considerations

Low risk overall, but consider:

Information disclosure: Completion objects might contain sensitive data that gets stored longer than intended in the failed attempts list.
Memory exhaustion: In pathological cases with very high retry counts, the failed attempts list could grow very large.

🧪 Test Coverage

Missing test scenarios to consider:

Tests for the new failed_attempts field population
Verification that attempt numbers are correctly tracked
Memory usage tests with high retry counts
Serialization/deserialization of the enhanced exception
Edge cases like empty completions or None completions

💡 Suggestions for Improvement

Extract duplicate logic:

def _is_last_attempt(max_retries, attempt) -> bool:
    if isinstance(max_retries, (Retrying, AsyncRetrying)) and hasattr(max_retries, "stop"):
        # ... existing logic
    return False

Consider memory optimization:

class FailedAttempt(NamedTuple):
    attempt_number: int
    exception: Exception
    completion_summary: str | None = None  # Instead of full completion

Add completion size limits:

def _safe_completion_storage(completion, max_size=1000):
    # Truncate or summarize large completions

Consider adding configuration:

# Allow users to disable detailed tracking if needed
track_failed_attempts: bool = True

📋 Summary

This is a solid enhancement that significantly improves debugging capabilities. The implementation is clean, well-typed, and maintains backward compatibility. The main concerns are around memory usage and potential information disclosure, but these are manageable risks for the significant debugging benefits gained.

Recommendation: ✅ Approve with consideration for the memory optimization suggestions in future iterations.

The enhancement provides valuable debugging information while maintaining the library's existing patterns and conventions.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 69f2017 in 40 seconds. Click for details.

Reviewed 156 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. examples/retry-tracking/example.py:1

Draft comment:
The entire example file has been removed. This file previously provided a valuable usage demonstration of the enhanced retry tracking feature. Please ensure that updated usage examples (in docs or tests) are provided so developers can understand and utilize the new feature.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_5jWLZcAyrGi2UXHW

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to df8be82 in 1 minute and 54 seconds. Click for details.

Reviewed 312 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 5 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. examples/retry-tracking/example.py:131

Draft comment:
Ensure the example code references a defined 'analyze_completion' function. The snippet uses 'analyze_completion(attempt.completion)' but this function isn’t defined/imported.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 85% This is a code example shown as a string inside triple quotes, meant to demonstrate API usage patterns. It's common in documentation to show example function calls that aren't fully implemented. The analyze_completion function is clearly a placeholder to show where custom analysis code would go. The code isn't meant to be run directly - it's demonstrative. The comment does identify a technically accurate issue - the function isn't defined. And someone copy-pasting this example directly would get an error. Documentation examples often use placeholder functions to demonstrate patterns. The comment is being overly pedantic about example code that isn't meant to be executed directly. Delete the comment. The undefined function is an intentional placeholder in example code meant to demonstrate usage patterns, not production code.

2. instructor/core/exceptions.py:53

Draft comment:
Consider adding a docstring note for the new 'failed_attempts' field in InstructorRetryException to document its purpose.
Reason this comment was not posted:
Confidence changes required: 66% <= threshold 85% None

3. instructor/core/retry.py:278

Draft comment:
When raising InstructorRetryException in the sync retry, use e.last_attempt.retry_state.attempt_number instead of the loop variable 'attempt.retry_state.attempt_number' to ensure correct attempt count.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. instructor/core/retry.py:336

Draft comment:
In the async retry loop, replace 'with attempt:' with 'async with attempt:' to properly handle the asynchronous context manager.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

5. instructor/core/retry.py:433

Draft comment:
Similarly, in the async function's RetryError block, use e.last_attempt.retry_state.attempt_number instead of 'attempt.retry_state.attempt_number' when setting 'n_attempts'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_PwyLKBX6RirvfgcW

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

jxnl requested a review from ivanleomk September 4, 2025 19:41

chore: remove example files after demonstration

69f2017

github-actions Bot added enhancement New feature or request python Pull requests that update python code size:M This PR changes 30-99 lines, ignoring generated files. labels Sep 4, 2025

ellipsis-dev Bot reviewed Sep 4, 2025

View reviewed changes

jxnl merged commit 9584ce7 into main Sep 4, 2025
15 checks passed

jxnl deleted the feat/retry-tracking branch September 4, 2025 19:56

coderabbitai Bot mentioned this pull request Apr 23, 2026

Refactor Productivity UI and Harden Backend Error Handling YKDBontekoe/miru#576

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(retry): add comprehensive tracking of all failed attempts and exceptions#1802

feat(retry): add comprehensive tracking of all failed attempts and exceptions#1802
jxnl merged 2 commits intomainfrom
feat/retry-tracking

jxnl commented Sep 4, 2025 •

edited by ellipsis-dev Bot

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented Sep 4, 2025 •

edited

Loading

Uh oh!

claude Bot commented Sep 4, 2025 •

edited

Loading

Uh oh!

claude Bot commented Sep 4, 2025 •

edited

Loading

Uh oh!

ellipsis-dev Bot left a comment

Uh oh!

ellipsis-dev Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jxnl commented Sep 4, 2025 • edited by ellipsis-dev Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Core Changes

Key Features

Benefits

Usage Example

Testing

Backward Compatibility

Uh oh!

cloudflare-workers-and-pages Bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

claude Bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete ✅

Code Review Results

✅ Code Quality & Best Practices

🐛 Potential Issues

⚡ Performance Considerations

🔒 Security Concerns

🧪 Test Coverage

📋 Additional Observations

Final Verdict

Uh oh!

claude Bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✅

Todo List:

Code Review Feedback

✅ Strengths

🔍 Code Quality & Best Practices

⚠️ Potential Issues

🏃♂️ Performance Considerations

🔒 Security Considerations

🧪 Test Coverage

💡 Suggestions for Improvement

📋 Summary

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jxnl commented Sep 4, 2025 •

edited by ellipsis-dev Bot

Loading

cloudflare-workers-and-pages Bot commented Sep 4, 2025 •

edited

Loading

claude Bot commented Sep 4, 2025 •

edited

Loading

claude Bot commented Sep 4, 2025 •

edited

Loading