Fix/ci uv migration by jxnl · Pull Request #1886 · 567-labs/instructor

jxnl · 2025-11-03T19:32:42Z

Important

Migrate CI workflows to use uv for dependency management and testing, refactor code for async handling and error improvements, and update tests accordingly.

CI Workflows:
- Migrate from poetry to uv for dependency management and test execution in .github/workflows/evals.yml, pyright.yml, and scheduled-release.yml.
- Update Python setup to use uv in evals.yml and pyright.yml.
Code Improvements:
- Use typing_extensions.TypeAlias in models.py.
- Refactor from_streaming_response_async() in iterable.py and partial.py to yield items directly.
- Improve image type detection in multimodal.py using magic bytes.
- Add cast and AsyncGenerator imports in response.py.
- Remove unused imports in utils.py and client.py.
- Add type ignore comments for requests exceptions in utils.py.
Tests:
- Add pytest.mark.skipif for tests requiring OPENAI_API_KEY in test_partial.py.
- Update test_reasoning.py to use deterministic results with temperature=0.
- Add retries in test_json_schema.py and test_reasoning.py.
- Update test_genai.py to handle decimal parsing for Receipt and Invoice models.
- Add tests for update_genai_kwargs in test_utils.py.
- Add exception handling in test_auto_client.py for provider availability.

^{This description was created by}^{for 2843267. You can customize this summary. It will automatically update as commits are pushed.}

Fixed two failing CI workflows: - Scheduled Release: Added --all-extras flag to ensure ruff and dev dependencies are installed - Weekly Tests: Replaced Poetry setup with uv (project no longer uses poetry.lock) Both workflows now use uv sync --all-extras --dev for consistent dependency installation. Fixes workflow runs #19029300998 and #19004677288 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Added @pytest.mark.skipif decorators to test_summary_extraction and test_summary_extraction_async in tests/dsl/test_partial.py to skip these tests when OPENAI_API_KEY is not set. These tests were causing CI failures in the "Core Tests" job because they make real OpenAI API calls but weren't being filtered by the -k 'not openai' pattern (since they're in the dsl directory). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Applied ruff formatting to imports. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Fixed multiple issues with async streaming: 1. partial.py: Changed from_streaming_response_async to properly yield from async generators instead of returning them directly 2. response.py: Return async generator directly instead of collecting into a list, which was breaking async iteration 3. test_process_response.py: Updated bedrock test to expect ValueError instead of NotImplementedError (matches actual implementation) These fixes resolve the "name 'cast' is not defined" and "'async for' requires an object with __aiter__ method" errors in streaming tests. Fixes #236 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Added try/except wrapper to test_user_extraction_async to properly skip tests when API keys are missing, matching the sync test behavior. This prevents test failures when providers are unavailable locally. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Applied ruff auto-fixes for deprecated typing imports and formatting. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

cloudflare-workers-and-pages · 2025-11-03T19:33:58Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Updated (UTC)
❌ Deployment failed View logs	instructor	`2843267`	Nov 03 2025, 08:10 PM

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to c54ac2d in 1 minute and 33 seconds. Click for details.

Reviewed 177 lines of code in 7 files
Skipped 0 files when reviewing.
Skipped posting 7 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. .github/workflows/evals.yml:19

Draft comment:
Migrating from poetry to uv looks good. Verify that caching and dependency management are fully covered with 'enable-cache: true'.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 85% This comment is asking the PR author to verify something, which is against the rules. It doesn't provide a specific suggestion or point out a specific issue with the code. Therefore, it should be removed.

2. .github/workflows/scheduled-release.yml:31

Draft comment:
The scheduled-release workflow uses setup-uv@v3 while evals uses v4. Consider aligning versions for consistency.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

3. instructor/dsl/partial.py:181

Draft comment:
The async streaming response handling now yields items via 'async for', ensuring proper async iteration. Confirm downstream consumers handle the async generator.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 85% This comment is asking the PR author to confirm that downstream consumers handle the async generator, which violates the rule against asking the author to confirm their intention or ensure behavior. It does not provide a specific suggestion or point out a specific issue.

4. instructor/processing/response.py:232

Draft comment:
process_response_async now returns an async generator directly instead of collecting items into a list. Update documentation and downstream usage if necessary.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 50% <= threshold 85% The comment suggests updating documentation and downstream usage due to a change in the return type of a function. This aligns with the rule that if library code changes, documentation should be updated. However, it also suggests checking downstream usage, which is not allowed by the rules. The comment is partially useful because it highlights the need for documentation updates.

5. tests/dsl/test_partial.py:141

Draft comment:
Adding @pytest.mark.skipif for tests needing OPENAI_API_KEY is good practice to avoid failures in environments without keys.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 85% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply acknowledges a good practice without offering any specific guidance or questions.

6. tests/test_auto_client.py:56

Draft comment:
Wrapping provider client calls in try/except blocks to skip tests when unavailable is pragmatic for CI environments, but ensure this behavior is intentional.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

7. tests/test_process_response.py:125

Draft comment:
The expected error type in the Bedrock invalid content test was updated from NotImplementedError to ValueError, along with a new error message. Ensure this change is consistent with provider implementation.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 85% This comment is asking the author to ensure consistency with the provider implementation, which is against the rules. It doesn't provide a specific suggestion or ask for a specific test to be written. It also doesn't address any of the additional rules provided.

Workflow ID: wflow_aEAiBuvp269Fgn6S

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 309ca1a in 2 minutes and 9 seconds. Click for details.

Reviewed 87 lines of code in 5 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. tests/llm/test_genai/test_utils.py:38

Draft comment:
Repeated logic for constructing 'excluded_categories' is used in multiple tests. Consider refactoring this setup into a helper to reduce duplication.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 85% None

2. tests/llm/test_genai/test_utils.py:141

Draft comment:
The test 'test_update_genai_kwargs_preserves_original' uses a shallow copy of kwargs. If nested dictionaries are mutated by update_genai_kwargs, consider using deepcopy to fully protect original inputs.
Reason this comment was not posted:
Confidence changes required: 70% <= threshold 85% None

3. tests/llm/test_genai/test_utils.py:165

Draft comment:
Test for 'thinking_config' pass-through is clear. Consider adding an edge case where nested keys inside thinking_config are present, to ensure they are retained as expected.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 85% None

4. tests/llm/test_genai/test_utils.py:112

Draft comment:
In 'test_update_genai_kwargs_none_values', the test verifies that keys with None values are omitted. Adding an inline comment explaining why these keys should be absent would improve clarity for future maintainers.
Reason this comment was not posted:
Confidence changes required: 30% <= threshold 85% None

Workflow ID: wflow_pdzqS73ONlGEyV5Y

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-11-03T19:38:55Z

        and stream
    ):
        # from_streaming_response_async returns an AsyncGenerator
-        # Collect all yielded values into a list
+        # Yield each item as it comes in
        # Note: response type varies by mode (ChatCompletion, AsyncGenerator, etc.)
-        tasks = []
-        async for task in response_model.from_streaming_response_async(  # type: ignore[arg-type]
+        return response_model.from_streaming_response_async(  # type: ignore[return-value]
            cast(AsyncGenerator[Any, None], response),  # type: ignore[arg-type]


Async iterable streaming returns coroutine instead of async generator

The streaming branch now returns response_model.from_streaming_response_async(...) directly, but IterableBase.from_streaming_response_async still returns a coroutine (it only returns tasks_from_chunks_async; see instructor/dsl/iterable.py). When AsyncInstructor.create_iterable(..., stream=True) awaits the retry wrapper and then executes async for item in await self.create_fn(...), the awaited value is this coroutine rather than an async iterable, so the call will raise TypeError: 'coroutine' object is not async iterable and streaming iterables no longer work. Either await the coroutine before returning or convert IterableBase.from_streaming_response_async into an async generator, similar to the partial path.

Useful? React with 👍 / 👎.

Fixed async streaming issues: 1. iterable.py: Changed from_streaming_response_async to properly yield from async generators instead of returning them 2. test_gemini: Fixed test modes from GEMINI_JSON to GENAI_JSON for google-genai provider compatibility This completes the async streaming fixes for both Partial and Iterable base classes, resolving all "async for requires __aiter__" errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 5163860 in 1 minute and 38 seconds. Click for details.

Reviewed 41 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. instructor/dsl/iterable.py:56

Draft comment:
Good refactor for async iteration; using 'async for' with yield improves streaming. Ensure similar patterns elsewhere are consistent.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85% None

2. tests/llm/test_gemini/test_multimodal_content.py:17

Draft comment:
Updated mode from GEMINI_JSON to GENAI_JSON aligns with new API naming.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85% None

3. tests/llm/test_gemini/test_multimodal_content.py:38

Draft comment:
Consistent mode update to GENAI_JSON for multi-message test.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85% None

Workflow ID: wflow_sqnOKQU1cKvwbbyV

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

1. test_gemini/test_multimodal_content.py: Changed GENAI_JSON (non-existent) to GENAI_TOOLS (correct mode for google-genai provider) 2. test_genai/test_decimal.py: Updated decimal validators to handle float/int inputs in addition to strings, since GenAI returns numeric values as floats These fixes resolve AttributeError and flaky decimal validation issues. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Changed ty check to only check instructor/ directory instead of the entire project. This excludes docs/tutorials and docs/hooks which contain example code and Jupyter notebooks that aren't meant to be strictly type-checked. Reduces diagnostics from 6413 to 68 (all in actual codebase). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

ellipsis-dev

Important

Looks good to me! 👍

Reviewed c8aea5d in 1 minute and 26 seconds. Click for details.

Reviewed 10 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. .github/workflows/pyright.yml:27

Draft comment:
Changed type check command to 'uv run ty check instructor/'. Confirm that targeting only 'instructor/' is intentional and covers all necessary code. Consider adding comments if this exclusion of other directories is deliberate.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 85% The comment is asking the PR author to "confirm" the intention and "consider adding comments" - both of these are explicitly prohibited by the rules. The rules state "Do NOT ask the PR author to confirm their intention, to explain, to double-check things, to ensure the behavior is intended" and "Do NOT comment unless there is clearly a code change required." This comment doesn't identify a clear bug or issue - it's purely asking for confirmation and suggesting documentation. The change appears deliberate (targeting a specific directory), and if the author made this change, they likely had a reason. There's no strong evidence of an actual problem here. Perhaps narrowing the type check scope could miss issues in other parts of the codebase like tests or examples, which could be a legitimate concern. The comment might be trying to prevent a real issue where type checking coverage is reduced. While reducing type check coverage could theoretically be an issue, the comment doesn't demonstrate that this is actually a problem - it just asks for confirmation. If there were other directories that clearly needed type checking, the comment should have stated that directly. The rules explicitly prohibit asking authors to confirm intentions or explain changes. This comment should be deleted because it violates the rule against asking the PR author to confirm their intention and doesn't identify a clear code issue. It's a verification request rather than pointing out a definite problem.

Workflow ID: wflow_wmMwkMnVUgYqgud5

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

Fixed Field() usage to properly pass description parameter instead of using it as a default value. This was causing the LLM to sometimes return the description string "User's first name" instead of extracting the actual name. Also added max_retries=3 to test_parse_validated_user_sync to handle occasional validation failures. Changes: - Field("User's first name") → Field(description="User's first name") - Added max_retries=3 to validated user test 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

ellipsis-dev

Important

Looks good to me! 👍

Reviewed c4a83ff in 5 minutes and 25 seconds. Click for details.

Reviewed 59 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. tests/llm/test_gemini/test_multimodal_content.py:17

Draft comment:
Changed provider mode to GENAI_TOOLS. Ensure that tests and API docs reflect the new mode. Also consider correcting 'compatability' to 'compatibility' for clarity.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 85% None

2. tests/llm/test_genai/test_decimal.py:17

Draft comment:
Enhanced validators to handle str, float, and int. Consider extracting a common conversion helper and watch for potential float precision issues.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 85% None

3. tests/llm/test_gemini/test_multimodal_content.py:15

Draft comment:
Typographical error: 'test_audio_compatability_list' appears to contain a typo. Consider renaming it to 'test_audio_compatibility_list'.
Reason this comment was not posted:
Comment was on unchanged code.

Workflow ID: wflow_uJcletBHcPP1nYpE

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 2d713e4 in 56 seconds. Click for details.

Reviewed 30 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. tests/llm/test_cohere/test_json_schema.py:11

Draft comment:
Good change: using Field(description=…) prevents misinterpretation as a default value.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85% None

2. tests/llm/test_cohere/test_json_schema.py:54

Draft comment:
Updated Field usage for ValidatedUser improves schema documentation.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85% None

3. tests/llm/test_cohere/test_json_schema.py:80

Draft comment:
Added max_retries=3 to test_parse_validated_user_sync to explicitly handle retries.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85% None

Workflow ID: wflow_PME0Qx3UjJQGeGJ3

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

Fixed several type checking issues to reduce diagnostics: 1. batch/models.py: Import TypeAlias from typing_extensions for Python 3.9-3.11 compatibility 2. bedrock/client.py: Use BaseClient instead of boto3.client as type annotation 3. bedrock/utils.py: Add type ignore comments for requests.exceptions (type stub issue) 4. test_anthropic/test_reasoning.py: Make reasoning test more reliable with temperature=0 and better prompt Reduces type diagnostics from 6413 to 61 (remaining are in xai provider). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 156a4d8 in 2 minutes and 44 seconds. Click for details.

Reviewed 1260 lines of code in 11 files
Skipped 0 files when reviewing.
Skipped posting 7 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. reproduce_1829.py:90

Draft comment:
Consider moving the import of 'anthropic' to the top-level. Currently, 'anthropic' is used in the function before it's imported in the main block, which may lead to a NameError in some contexts.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 85% This is a reproduction script meant to be run directly (as evidenced by the if __name__ == "__main__" pattern and the docstring). The comment is technically correct - there is a potential NameError if the function is called outside the main block. However, reproduction scripts are typically meant to be run as standalone scripts, not imported as modules. The rules say "Do NOT comment unless there is clearly a code change required" and to avoid comments on things that are "obvious or unimportant." For a reproduction script, this import pattern is common and acceptable. The comment is also somewhat speculative ("may lead to a NameError in some contexts") rather than definitively identifying a problem in the intended use case. While the comment is technically correct about the potential NameError, reproduction scripts are typically run as standalone files where the if __name__ == "__main__" block executes. The author likely structured it this way intentionally for a reproduction script. This might be an overly pedantic comment for this type of file. Even though this is a reproduction script, having the import at the top level would be better practice and wouldn't hurt anything. However, the comment uses hedging language ("may lead to") and mentions "some contexts" which makes it speculative. For the intended use case (running as a script), this works fine and would be caught immediately if there were an issue. This comment is technically correct but is about a non-issue for a reproduction script that's meant to be run directly. The speculative nature ("may lead to... in some contexts") and the fact that this is a reproduction script (not production code) make this comment not particularly useful. It would be caught immediately if run incorrectly.

2. tests/llm/test_anthropic/test_reasoning.py:29

Draft comment:
Add an assertion error message for clarity on failure and consider using a tolerance (e.g. math.isclose) for float comparisons.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 85% None

3. SOLUTION_1303_1305.md:1

Draft comment:
If this markdown file is intended as documentation, ensure that it is added to the mkdocs.yml configuration to keep documentation up-to-date.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 85% None

4. examples/syptom-tracker/README.md:1

Draft comment:
The folder is named 'syptom-tracker' but the content refers to 'Symptom Tracker'. Confirm whether this is a typo; if so, consider renaming the folder to 'symptom-tracker' for consistency.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 85% This comment is asking the author to "confirm" something and "consider" renaming, which violates the rule "Do NOT ask the PR author to confirm their intention, to explain, to double-check things, to ensure the behavior is intended." The comment is phrased as a question/request for confirmation rather than a clear statement that a change is needed. However, there IS a clear typo in the folder name ('syptom' vs 'symptom'), which is objectively wrong and would require a file/folder rename to fix. This is a legitimate issue, but the comment violates the rules by asking for confirmation rather than stating the issue directly. Additionally, the rules state "Do NOT comment on anything that would be obviously caught by the build, such as variable renames, file renames" - though this is about suggesting a rename, not catching one that already happened. The folder name typo is a real issue that should be fixed, and it's not something that would be caught by the build since it's just a folder name. The comment identifies a legitimate problem even if it's phrased as a question. Maybe the phrasing issue isn't severe enough to warrant deletion if the underlying issue is valid. While the typo is real, the comment explicitly asks the author to "confirm" something, which directly violates the rules. The comment should either state definitively that the folder should be renamed or not be made at all. The tentative phrasing ("Confirm whether this is a typo; if so, consider...") makes it speculative and asks for confirmation, both of which are against the rules. The comment should be deleted because it asks the author to confirm their intention and uses tentative language ("consider") rather than making a clear, actionable statement. Even though there is a real typo, the comment violates the rule against asking for confirmation.

5. examples/syptom-tracker/extract_from_transcript.py:47

Draft comment:
Typographical note: The model name 'gpt-4o' appears here. Is this a typo for 'gpt-4'?
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 85% 'gpt-4o' is indeed a real OpenAI model (GPT-4 Omni), released by OpenAI. This is not a typo. The comment is incorrect because it assumes 'gpt-4o' is a mistake when it's actually a valid and intentional model choice. The consistent use of 'gpt-4o' across all three function calls in the file further suggests this is intentional. This comment should be deleted as it's factually incorrect and would mislead the PR author. Could 'gpt-4o' be a custom or internal model name that I'm not aware of? Perhaps in some contexts or older versions of the OpenAI API, this wasn't a valid model name, and the comment might have been correct at the time it was made. While it's theoretically possible this could be a custom model, 'gpt-4o' is a well-known OpenAI model (GPT-4 Omni) that has been publicly available. The consistent usage across the file and the fact that this is example code using the OpenAI API strongly indicates this is the legitimate model name. The comment is incorrect. This comment should be deleted. 'gpt-4o' is a legitimate OpenAI model name (GPT-4 Omni), not a typo. The comment is factually incorrect and would confuse the PR author by suggesting they fix something that isn't broken.

6. examples/syptom-tracker/extract_from_transcript.py:88

Draft comment:
Typographical note: The model name 'gpt-4o' is used again. Please verify if it should be 'gpt-4'.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 85% This comment is asking the author to "verify" if the model name is correct. This directly violates the rule "Do NOT ask the PR author to confirm their intention, to explain, to double-check things, to ensure the behavior is intended". Additionally, 'gpt-4o' is a legitimate model name (GPT-4 Omni) from OpenAI, so this isn't even identifying a real issue. The author consistently used 'gpt-4o' three times in the file, which suggests it's intentional. This comment should be deleted. Could 'gpt-4o' be a typo that the author made consistently? Perhaps the automated tool has knowledge that 'gpt-4' is more commonly used or appropriate for this use case? Even if 'gpt-4o' were a typo, the comment is phrased as "please verify", which explicitly asks the author to confirm their intention - a clear violation of the rules. Additionally, 'gpt-4o' is a real, valid model name, so there's no evidence this is actually wrong. The consistent usage across the file suggests intentionality. This comment should be deleted because it asks the author to verify/confirm their intention, which violates the review rules. Additionally, 'gpt-4o' is a valid OpenAI model name, so there's no evidence of an actual issue.

7. examples/syptom-tracker/extract_from_transcript.py:126

Draft comment:
Typographical note: The model name 'gpt-4o' appears here as well. Consider confirming the correct model name.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 85% This comment is asking the PR author to confirm the correct model name, which violates the rule against asking for confirmation or verification. It does not provide a specific suggestion or improvement to the code itself.

Workflow ID: wflow_QKKyPvjucV7WVRrd

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 2843267 in 4 minutes and 26 seconds. Click for details.

Reviewed 90 lines of code in 4 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. instructor/batch/models.py:9

Draft comment:
Prefer importing TypeAlias from typing_extensions to support older Python versions.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85% None

2. instructor/providers/bedrock/client.py:14

Draft comment:
Updating the client type hint to BaseClient improves generality and type safety over boto3.client.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85% None

3. instructor/providers/bedrock/utils.py:189

Draft comment:
Type ignores added for requests exceptions; consider checking if updated stubs or explicit imports can remove these ignores in future.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 85% None

4. tests/llm/test_anthropic/test_reasoning.py:29

Draft comment:
Consider adding explicit error messages to the assertions to improve debugging when tests fail.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 85% None

Workflow ID: wflow_hLgEMCHKl1KSw7nY

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

jxnl and others added 6 commits November 3, 2025 14:14

style: auto-format response.py imports

e114fff

Applied ruff formatting to imports. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

style: apply ruff formatting and linting fixes

309ca1a

Applied ruff auto-fixes for deprecated typing imports and formatting. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

github-actions Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code size:L This PR changes 100-499 lines, ignoring generated files. labels Nov 3, 2025