Skip to content

feat(retry): add comprehensive tracking of all failed attempts and exceptions#1802

Merged
jxnl merged 2 commits intomainfrom
feat/retry-tracking
Sep 4, 2025
Merged

feat(retry): add comprehensive tracking of all failed attempts and exceptions#1802
jxnl merged 2 commits intomainfrom
feat/retry-tracking

Conversation

@jxnl
Copy link
Copy Markdown
Collaborator

@jxnl jxnl commented Sep 4, 2025

Description

This PR enhances the retry mechanism in Instructor to track all failed completions and exceptions across the entire range of retries, not just the final failure. This provides much better debugging capabilities and insight into retry patterns.

Changes

Core Changes

  • New data structure: Tracks individual retry failures with attempt number, exception, and completion response
  • **Enhanced **: Now includes a list containing all retry failures
  • Updated retry functions: Both and now collect comprehensive failure information

Key Features

  • Track attempt number for each failure
  • Store the actual exception that occurred
  • Preserve completion responses (when available) for analysis
  • Maintain backward compatibility with existing exception handling

Benefits

  • Better debugging: See exactly what failed at each retry attempt
  • Pattern analysis: Identify if failures are consistent or changing across retries
  • Completion inspection: Access raw LLM responses that failed validation
  • Comprehensive error reporting: Full visibility into the retry process

Usage Example

try:
    response = client.chat.completions.create(
        response_model=MyModel,
        messages=messages,
        max_retries=3
    )
except InstructorRetryException as e:
    print(f"Failed after {e.n_attempts} attempts")
    
    # New: Access all failed attempts
    for attempt in e.failed_attempts:
        print(f"Attempt {attempt.attempt_number}: {attempt.exception}")
        if attempt.completion:
            # Analyze the raw completion that failed
            analyze_completion(attempt.completion)

Testing

  • All existing tests pass (backward compatibility maintained)
  • Linting and formatting checks pass
  • Example demonstrates the new functionality

Backward Compatibility

This change is fully backward compatible. The new field is optional and defaults to an empty list if not provided.

This PR was written by Cursor


Important

Enhances retry mechanism to track all failed attempts and exceptions for improved debugging and analysis.

  • Behavior:
    • Introduces FailedAttempt in exceptions.py to track retry attempts with attempt number, exception, and completion.
    • Updates InstructorRetryException to include failed_attempts list.
    • Modifies retry_sync and retry_async in retry.py to populate failed_attempts with each failed attempt.
  • Benefits:
    • Provides detailed tracking of all retry attempts for better debugging and analysis.
    • Maintains backward compatibility by defaulting failed_attempts to an empty list if not provided.

This description was created by Ellipsis for 69f2017. You can customize this summary. It will automatically update as commits are pushed.

…ceptions

- Add FailedAttempt data structure to track individual retry failures
- Extend InstructorRetryException to include failed_attempts list
- Modify retry_sync and retry_async to collect all failures across retries
- Track attempt number, exception, and completion for each failed retry
- Enables detailed analysis of retry patterns and failure modes
@jxnl jxnl requested a review from ivanleomk September 4, 2025 19:41
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Sep 4, 2025

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
instructor 69f2017 Commit Preview URL

Branch Preview URL
Sep 04 2025, 07:53 PM

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Sep 4, 2025

Claude finished @jxnl's task —— View job


Code Review Complete ✅

Reviewing the retry tracking enhancement PR. Here's my analysis checklist:

  • Read and analyze modified files
  • Review code quality and best practices
  • Check for potential bugs or issues
  • Evaluate performance considerations
  • Assess security concerns
  • Analyze test coverage
  • Provide comprehensive feedback

Code Review Results

This PR adds comprehensive retry tracking functionality to the Instructor library. Here's my detailed analysis:

✅ Code Quality & Best Practices

Positive aspects:

  • Clean, well-structured implementation following existing patterns
  • Good use of type hints including NamedTuple for FailedAttempt
  • Proper integration with existing error handling flow
  • Consistent code style matching the codebase
  • Clear docstring documentation

Areas for improvement:

  • The FailedAttempt.completion: Any | None = None could be more type-specific if possible
  • Consider adding a docstring to the FailedAttempt class explaining the purpose of each field

🐛 Potential Issues

Minor concerns:

  1. Memory accumulation: In scenarios with many retries, the failed_attempts list could accumulate significant memory, especially if completions contain large responses. Consider documenting this behavior or adding optional size limits.

  2. Duplicate code: The retry attempt tracking logic is nearly identical between retry_sync() and retry_async() functions (lines 207-213 and 247-253 in sync, 362-368 and 402-408 in async). Consider extracting this into a helper function to reduce duplication.

⚡ Performance Considerations

Good performance practices:

  • Minimal overhead - only adds list appends during failures
  • No impact on successful requests
  • Pre-allocates empty list avoiding repeated list reallocations

Potential optimization:

  • The failed_attempts list is always initialized even when retries may not be needed. This is acceptable overhead given the simplicity.

🔒 Security Concerns

No security issues identified:

  • The implementation doesn't expose sensitive data
  • Failed attempts are only stored in memory and not logged automatically
  • The completion data access is controlled by the calling code

🧪 Test Coverage

Concerns identified:

  • Missing test coverage for new functionality: The existing test files don't appear to test the new failed_attempts field
  • The test_exceptions.py doesn't test FailedAttempt at all
  • No tests verify that all failed attempts are properly tracked
  • No tests verify the attempt numbering is correct
  • No tests ensure backward compatibility of the new optional parameter

Recommended test additions:

# Test that failed_attempts are populated correctly
def test_instructor_retry_exception_with_failed_attempts():
    from instructor.core.exceptions import FailedAttempt
    
    attempt1 = FailedAttempt(1, ValueError("first error"), None)
    attempt2 = FailedAttempt(2, ValueError("second error"), {"some": "completion"})
    
    with pytest.raises(InstructorRetryException) as exc_info:
        raise InstructorRetryException(
            n_attempts=2,
            total_usage=100,
            failed_attempts=[attempt1, attempt2]
        )
    
    exception = exc_info.value
    assert len(exception.failed_attempts) == 2
    assert exception.failed_attempts[0].attempt_number == 1
    assert exception.failed_attempts[1].completion == {"some": "completion"}

# Test backward compatibility
def test_instructor_retry_exception_backward_compatibility():
    # Should work without failed_attempts parameter
    with pytest.raises(InstructorRetryException) as exc_info:
        raise InstructorRetryException(n_attempts=1, total_usage=50)
    
    assert exc_info.value.failed_attempts == []

📋 Additional Observations

Positive implementation details:

  • Backward compatibility is properly maintained with failed_attempts or []
  • The attempt numbers align with tenacity's retry state
  • Both validation errors and completion errors are tracked appropriately
  • Integration with hooks system is preserved

Minor suggestions:

  1. Consider adding a class method to FailedAttempt for creating instances without completion data
  2. The exception message could potentially include a count of failed attempts for better UX

Final Verdict

This is a solid implementation that adds valuable debugging capabilities without breaking existing functionality. The main concerns are:

  1. Missing test coverage for the new functionality (high priority)
  2. Code duplication between sync/async functions (medium priority)
  3. Memory considerations for high-retry scenarios (low priority - document only)

The PR is functionally correct and ready for use, but I recommend adding comprehensive tests before merging to ensure the new functionality works as expected and maintains backward compatibility.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Sep 4, 2025

Claude finished @jxnl's task —— View job


PR Review Complete ✅

Reviewing PR #1802: Enhanced retry mechanism with comprehensive failure tracking

Todo List:

  • Read and analyze changed files
  • Review code quality and best practices
  • Check for potential bugs or issues
  • Assess performance considerations
  • Identify security concerns
  • Evaluate test coverage
  • Provide comprehensive feedback

Code Review Feedback

Strengths

  1. Well-structured design: The FailedAttempt NamedTuple is clean and focused, capturing the essential information for each retry attempt.

  2. Backward compatibility: The implementation properly defaults failed_attempts to an empty list, maintaining compatibility with existing code.

  3. Consistent implementation: Both sync (retry_sync) and async (retry_async) functions are updated identically, maintaining API consistency.

  4. Good typing: Proper type hints are used throughout, including list[FailedAttempt] and union types.

  5. Comprehensive tracking: The implementation captures both validation errors and general exceptions, providing complete visibility into retry failures.

🔍 Code Quality & Best Practices

Good practices observed:

  • Clean separation of concerns with the FailedAttempt data structure
  • Consistent error handling patterns between sync/async variants
  • Proper logging and debug information maintained
  • Type annotations are comprehensive and accurate

Areas for consideration:

  • The logic for determining "last attempt" (lines 216-233 and 371-387) is complex and duplicated. Consider extracting this into a helper function.

⚠️ Potential Issues

  1. Memory consumption:

    • The failed_attempts list will grow with each retry attempt
    • For high retry counts or large completion objects, this could consume significant memory
    • Consider: Should there be a limit on tracked attempts or size-based truncation?
  2. Completion object storage:

    • completion can be Any | None, and storing full completion objects might be memory-intensive
    • The completion object could contain large amounts of text or embeddings
    • Consider: Could you store only essential debugging information instead of the full completion?
  3. Exception chaining:

    • Line 276 and 431: e.last_attempt._exception - accessing private _exception attribute
    • This could be fragile if tenacity changes its internal implementation
    • Consider: Use tenacity's public API or add defensive code

🏃♂️ Performance Considerations

  1. List operations: Using list.append() for tracking attempts is O(1) and efficient.

  2. Memory impact: Each FailedAttempt object stores:

    • An integer (attempt number)
    • An exception object
    • Potentially large completion response

    This could accumulate significant memory for operations with many retries.

  3. Serialization concerns: If these exceptions are logged or serialized, the completion objects could cause performance issues.

🔒 Security Considerations

Low risk overall, but consider:

  1. Information disclosure: Completion objects might contain sensitive data that gets stored longer than intended in the failed attempts list.

  2. Memory exhaustion: In pathological cases with very high retry counts, the failed attempts list could grow very large.

🧪 Test Coverage

Missing test scenarios to consider:

  • Tests for the new failed_attempts field population
  • Verification that attempt numbers are correctly tracked
  • Memory usage tests with high retry counts
  • Serialization/deserialization of the enhanced exception
  • Edge cases like empty completions or None completions

💡 Suggestions for Improvement

  1. Extract duplicate logic:

    def _is_last_attempt(max_retries, attempt) -> bool:
        if isinstance(max_retries, (Retrying, AsyncRetrying)) and hasattr(max_retries, "stop"):
            # ... existing logic
        return False
  2. Consider memory optimization:

    class FailedAttempt(NamedTuple):
        attempt_number: int
        exception: Exception
        completion_summary: str | None = None  # Instead of full completion
  3. Add completion size limits:

    def _safe_completion_storage(completion, max_size=1000):
        # Truncate or summarize large completions
  4. Consider adding configuration:

    # Allow users to disable detailed tracking if needed
    track_failed_attempts: bool = True

📋 Summary

This is a solid enhancement that significantly improves debugging capabilities. The implementation is clean, well-typed, and maintains backward compatibility. The main concerns are around memory usage and potential information disclosure, but these are manageable risks for the significant debugging benefits gained.

Recommendation: ✅ Approve with consideration for the memory optimization suggestions in future iterations.

The enhancement provides valuable debugging information while maintaining the library's existing patterns and conventions.


@github-actions github-actions Bot added enhancement New feature or request python Pull requests that update python code size:M This PR changes 30-99 lines, ignoring generated files. labels Sep 4, 2025
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 69f2017 in 40 seconds. Click for details.
  • Reviewed 156 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. examples/retry-tracking/example.py:1
  • Draft comment:
    The entire example file has been removed. This file previously provided a valuable usage demonstration of the enhanced retry tracking feature. Please ensure that updated usage examples (in docs or tests) are provided so developers can understand and utilize the new feature.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_5jWLZcAyrGi2UXHW

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to df8be82 in 1 minute and 54 seconds. Click for details.
  • Reviewed 312 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. examples/retry-tracking/example.py:131
  • Draft comment:
    Ensure the example code references a defined 'analyze_completion' function. The snippet uses 'analyze_completion(attempt.completion)' but this function isn’t defined/imported.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 85% This is a code example shown as a string inside triple quotes, meant to demonstrate API usage patterns. It's common in documentation to show example function calls that aren't fully implemented. The analyze_completion function is clearly a placeholder to show where custom analysis code would go. The code isn't meant to be run directly - it's demonstrative. The comment does identify a technically accurate issue - the function isn't defined. And someone copy-pasting this example directly would get an error. Documentation examples often use placeholder functions to demonstrate patterns. The comment is being overly pedantic about example code that isn't meant to be executed directly. Delete the comment. The undefined function is an intentional placeholder in example code meant to demonstrate usage patterns, not production code.
2. instructor/core/exceptions.py:53
  • Draft comment:
    Consider adding a docstring note for the new 'failed_attempts' field in InstructorRetryException to document its purpose.
  • Reason this comment was not posted:
    Confidence changes required: 66% <= threshold 85% None
3. instructor/core/retry.py:278
  • Draft comment:
    When raising InstructorRetryException in the sync retry, use e.last_attempt.retry_state.attempt_number instead of the loop variable 'attempt.retry_state.attempt_number' to ensure correct attempt count.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. instructor/core/retry.py:336
  • Draft comment:
    In the async retry loop, replace 'with attempt:' with 'async with attempt:' to properly handle the asynchronous context manager.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. instructor/core/retry.py:433
  • Draft comment:
    Similarly, in the async function's RetryError block, use e.last_attempt.retry_state.attempt_number instead of 'attempt.retry_state.attempt_number' when setting 'n_attempts'.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_PwyLKBX6RirvfgcW

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@jxnl jxnl merged commit 9584ce7 into main Sep 4, 2025
15 checks passed
@jxnl jxnl deleted the feat/retry-tracking branch September 4, 2025 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python Pull requests that update python code size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant