Standardize provider imports in documentation by jxnl · Pull Request #1896 · 567-labs/instructor

jxnl · 2025-11-06T05:26:07Z

Important

Standardizes client initialization in documentation by replacing various methods with from_provider for consistency.

Client Initialization:
- Replaces from_openai, from_anthropic, etc., with from_provider in all examples.
- Standardizes client initialization across docs/prompting, docs/start-here.md, and other documentation files.
Examples:
- Updates examples to use from_provider for client setup, ensuring consistency.
- Affects various prompting techniques, such as emotion_prompting, role_prompting, and self_ask.
Documentation:
- Ensures all documentation reflects the new standard for client initialization.
- Improves clarity and consistency in how clients are instantiated in examples.

^{This description was created by}^{for a9bcf17. You can customize this summary. It will automatically update as commits are pushed.}

- Update index.md to use from_provider() instead of provider-specific functions - Update getting-started.md to use from_provider() - Update start-here.md to use from_provider() - Use gpt-4o-mini as default OpenAI model - Use claude-3-haiku-20240307 as default Anthropic model - Simplify imports by removing provider SDK imports where possible - Demonstrate async_client=True pattern for async examples This standardizes the documentation to use the unified from_provider() API across all examples, making it easier for users to switch between providers.

Automated migration of 131 documentation files to standardize on the unified from_provider() API instead of provider-specific functions. Changes: - Replace from_openai() → from_provider("openai/gpt-5-nano") - Replace from_anthropic() → from_provider("anthropic/claude-3-haiku-20240307") - Replace from_gemini() → from_provider("google/gemini-2.5-flash") - Replace from_groq() → from_provider("groq/llama3-70b-8192") - Replace from_cohere() → from_provider("cohere/command-r-plus") - Replace from_cerebras() → from_provider("cerebras/llama3.1-70b") - Replace from_fireworks() → from_provider("fireworks/llama-v3p2-1b-instruct") - Handle async_client=True for AsyncOpenAI/AsyncAnthropic - Clean up unnecessary provider SDK imports - Standardize on latest model versions Default models: - OpenAI: gpt-5-nano (from gpt-4o-mini, gpt-3.5-turbo, etc.) - Anthropic: claude-3-haiku-20240307 - Google: gemini-2.5-flash (from gemini-1.5-flash/pro) Files affected: - Main docs: index.md, getting-started.md, start-here.md - Learning patterns: simple_object, list_extraction, validation, etc. - Examples: classification, knowledge graphs, multimodal, etc. - Blog posts: 28 posts updated - Prompting guides: decomposition, few-shot, self-criticism, etc. - Integration guides: architecture, debugging, faq, etc. - Concepts: caching, hooks, validation, templating, etc. Benefits: - Simpler, more consistent API across all examples - Easier for users to switch between providers - Cleaner code with fewer imports - Future-proof provider management

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to 1dbdcfd in 2 minutes and 41 seconds. Click for details.

Reviewed 4261 lines of code in 131 files
Skipped 0 files when reviewing.
Skipped posting 22 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. migrate_to_from_provider.py:129

Draft comment:
The 'changes' calculation as the difference in counts of 'from_provider' may be misleading; consider a more accurate metric for the number of substitutions made.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 85% The comment raises a valid concern about the accuracy of the metric. However, looking at the rules, I need to consider: 1) Is this clearly a code change that's required? The current implementation is simple and likely works fine for the intended use case (a migration script). 2) Is this speculative? The comment says "may be misleading" which is somewhat speculative - it's not definitively wrong. 3) Is this important? For a one-time migration script, having a perfectly accurate count is less critical than for production code. 4) The comment doesn't provide a concrete alternative solution. Given that this is a utility script (not library code), and the metric is only used for informational output to the user, the current approach is probably sufficient. The comment does identify a real edge case where the calculation could be inaccurate (if 'from_provider' appears in strings, comments, or already exists). For a code quality tool, pointing out potential inaccuracies in metrics could be valuable. The script is new code, so this is a good time to fix it if it's actually a problem. While the edge case is real, the comment is somewhat speculative ("may be misleading") and doesn't provide a clear, actionable alternative. For a migration script that's likely to be run once, the current simple approach is probably adequate. The comment also doesn't demonstrate that this is actually a problem in practice - it's more of a theoretical concern. According to the rules, speculative comments should be removed. This comment should be deleted. It's speculative ("may be misleading"), doesn't provide a concrete alternative solution, and for a one-time migration utility script, the current simple metric is likely sufficient. The comment doesn't show strong evidence of an actual problem.

2. migrate_to_from_provider.py:30

Draft comment:
Multiple regex patterns for provider-specific replacements are repeated. Consider refactoring these into a helper function to adhere to DRY principles.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 85% None

3. migrate_to_from_provider.py:99

Draft comment:
The logic for cleaning up unused import lines using 'skip_next_blank' is somewhat brittle; a more robust parsing mechanism for Python import statements might avoid accidental removal of necessary blanks.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 85% None

4. migrate_to_from_provider.py:107

Draft comment:
The negative lookahead used in the regex for checking provider usage (e.g., for 'OpenAI(') could be fragile. Consider documenting the assumptions or using a more explicit parser.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 85% None

5. docs/blog/posts/extracting-model-metadata.md:176

Draft comment:
It looks like there may be a typographical error in the model version: 'gpt-4o' appears in the model parameter. Did you mean 'gpt-4'? Please check.
Reason this comment was not posted:
Comment was on unchanged code.

6. docs/blog/posts/generating-pdf-citations.md:70

Draft comment:
There appears to be an extra closing parenthesis on this line. Remove the standalone ")" after the provider call to avoid syntax issues.
Reason this comment was not posted:
Marked as duplicate.

7. docs/blog/posts/multimodal-gemini.md:56

Draft comment:
Typo: The closing parenthesis on this line seems unmatched. It should be removed to reflect the correct function call syntax.
Reason this comment was not posted:
Marked as duplicate.

8. docs/blog/posts/rag-timelines.md:62

Draft comment:
Typo check: The model name "gpt-4o" on this line might be a typographical error. Please verify if it should be "gpt-4" or if "gpt-4o" is intended.
Reason this comment was not posted:
Comment was on unchanged code.

9. docs/blog/posts/validation-part1.md:39

Draft comment:
Typographical note: Consider capitalizing 'openai' to 'OpenAI' and 'api' to 'API' in 'using openai's function call api' for consistency with standard naming conventions.
Reason this comment was not posted:
Comment was on unchanged code.

10. docs/blog/posts/validation-part1.md:433

Draft comment:
Typographical note: consider hyphenating "self correct" to "self-correct" for clarity.
Reason this comment was not posted:
Comment was on unchanged code.

11. docs/blog/posts/validation-part1.md:466

Draft comment:
Typo alert: the word 'jason' in "Extract jason is 25 years old" may be a misspelling of 'JSON' unless it is intentional.
Reason this comment was not posted:
Comment was on unchanged code.

12. docs/blog/posts/version-1.md:45

Draft comment:
Typo detected: There is an extraneous period in the sentence "...passed to the client. via kwargs." Consider removing the period so it reads "...passed to the client via kwargs."
Reason this comment was not posted:
Comment was on unchanged code.

13. docs/examples/multi_modal_gemini.md:40

Draft comment:
There seems to be an extraneous comma at the end of the instructor.from_provider("google/gemini-2.5-flash") call on line 40. This might be a typographical error; please remove the comma if it is unintended.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 85% The comment is pointing out a legitimate syntax error. The code shows instructor.from_provider("google/gemini-2.5-flash"), with a comma, followed by mode=instructor.Mode.GEMINI_JSON, and then a closing ). This is invalid Python syntax. The comma after the closing parenthesis of from_provider() shouldn't be there - it should be an opening parenthesis instead, like instructor.from_provider("google/gemini-2.5-flash", (without the closing paren). This is clearly a code change issue introduced in the diff, as the old code had proper syntax with instructor.from_gemini(client=genai.GenerativeModel(...),. The comment is about a change made in the diff and identifies a real bug that would cause the code to fail. However, one of the rules states "Do NOT comment on anything that would be obviously caught by the build, such as variable renames, file renames, or changed imports." A syntax error like this would definitely be caught immediately when trying to run the code - Python would throw a SyntaxError. This falls under things that would be "obviously caught by the build." While syntax errors are typically caught by builds, this is documentation/example code that might not be run through automated testing. The rule about build-time catches is more about things like variable renames in actual source code. Documentation examples are often not validated automatically, so syntax errors in docs could slip through. However, the rule is still pretty clear about not commenting on things caught by builds. This comment identifies a legitimate syntax error in the code. However, per the rules, we should not comment on things that would be obviously caught by the build, and a syntax error would immediately fail when the code is run. Despite this being documentation, the rule seems to apply broadly.

14. docs/examples/multi_modal_gemini.md:95

Draft comment:
Typographical issue: There is an extra trailing comma on line 95 after the call to instructor.from_provider("google/gemini-2.5-flash"). This creates a tuple instead of passing the mode parameter to the function, and the parentheses no longer align as intended. Please fix this so that the call includes the mode parameter correctly in the function arguments.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 85% This is a legitimate syntax error. The original code had instructor.from_gemini(client=..., mode=...) which was correct. The new code has instructor.from_provider("google/gemini-2.5-flash"), with a trailing comma, followed by mode=instructor.Mode.GEMINI_JSON, and then ). This creates invalid syntax - you can't have mode= as a parameter when there's no function call to attach it to. The same issue appears on lines 40, 95, and 161 based on the diff. This would definitely be caught by running the code, but it's a clear error introduced by the change. The rules state "Do NOT comment on anything that would be obviously caught by the build, such as variable renames, file renames, or changed imports." This syntax error would be caught immediately when trying to run this Python code, so according to the rules, I should not keep this comment. While this would be caught by running the code, it's a clear typo in documentation/example code that was introduced by the change. However, the rules are explicit that we should not comment on things that would be caught by the build. Since this is a syntax error that would fail immediately, it falls under that category. Despite this being a legitimate syntax error introduced by the changes, the rules explicitly state not to comment on things that would be caught by the build. A Python syntax error would be immediately caught when running the code, so this comment should be deleted according to the guidelines.

15. docs/examples/multi_modal_gemini.md:161

Draft comment:
There seems to be an extraneous comma at the end of the call to instructor.from_provider. It might be a typo—please remove the trailing comma from line 161 to ensure the multi-line function call is properly formatted.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 85% The comment is pointing out a trailing comma issue. Looking at the code, line 161 shows client = instructor.from_provider("google/gemini-2.5-flash"), with a comma at the end. The next line has mode=instructor.Mode.GEMINI_JSON, and then line 163 has ). This is actually a syntax error - the opening parenthesis is on line 161 after from_provider, but there's a comma after the closing quote which shouldn't be there. Wait, actually looking more carefully, the structure seems to be trying to call from_provider with one argument, then have a comma, then mode= on the next line. This would be invalid syntax. However, one of the rules states "Do NOT comment on anything that would be obviously caught by the build, such as variable renames, file renames, or changed imports." A syntax error would definitely be caught by the build when the code is run or tested. This is a clear syntax error that Python would immediately catch. The rule about not commenting on things caught by the build is very clear. A syntax error like this trailing comma would immediately cause a Python syntax error when the file is imported or run, making it something that would be "obviously caught by the build." You're right - this is a syntax error that would be immediately caught when trying to run this code. Python would throw a SyntaxError before any tests could even run. According to the rules, I should not keep comments about issues that would be obviously caught by the build. This comment should be deleted because it points out a syntax error that would be obviously caught by the build. The rules explicitly state not to comment on anything that would be obviously caught by the build.

16. docs/examples/tracing_with_langfuse.md:35

Draft comment:
Typographical error: The environment variable key has a mismatched quote/bracket. It should likely be fixed to os.environ["OPENAI_API_KEY"] = "sk-...".
Reason this comment was not posted:
Comment was on unchanged code.

17. docs/index.md:280

Draft comment:
It looks like the word 'Jason' in the string "Extract Jason is 25 years old." might be a typographical error. If the intent is to reference JSON data, please correct 'Jason' to 'JSON'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

18. docs/index.md:842

Draft comment:
Typo: Consider changing "to simply the type inference" to "to simplify the type inference".
Reason this comment was not posted:
Comment was on unchanged code.

19. docs/integrations/google.md:322

Draft comment:
There appears to be an extraneous comma after the call to instructor.from_provider("google/gemini-2.5-flash") which results in a misplaced comma before the subsequent mode argument. Please review and remove the comma if it was not intended.
Reason this comment was not posted:
Marked as duplicate.

20. docs/integrations/google.md:352

Draft comment:
Typo: There's an extraneous trailing comma at the end of line 352 which causes the mode parameter on the next line to fall outside of the function call's argument list. It should be removed so that the code correctly passes mode=instructor.Mode.VERTEXAI_TOOLS to from_provider.
Reason this comment was not posted:
Comment looked like it was already resolved.

21. docs/learning/getting_started/client_setup.md:140

Draft comment:
There's an unexpected sequence on line 140: ):. It doesn't seem to belong and might be a leftover artifact. Please confirm if this should be removed.
Reason this comment was not posted:
Comment was on unchanged code.

22. docs/prompting/ensembling/dense.md:27

Draft comment:
Typographical error: The variable name 'concetenated_examples' appears to be misspelled. It should be 'concatenated_examples'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_TbWUV0opojOYkQST

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

Updates Anthropic default model from claude-3-haiku-20240307 to claude-3-5-haiku-latest (the latest available Haiku model). Changes: - Anthropic: claude-3-haiku-20240307 → claude-3-5-haiku-latest - Google: google/gemini-2.5-flash (confirmed correct format) - OpenAI: openai/gpt-5-nano (already correct) Updated migration script with correct default models for future use.

cloudflare-workers-and-pages · 2025-11-06T05:29:24Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Updated (UTC)
❌ Deployment failed View logs	instructor	`eae77ac`	Nov 06 2025, 05:50 AM

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 19f4a0e in 1 minute and 26 seconds. Click for details.

Reviewed 242 lines of code in 9 files
Skipped 0 files when reviewing.
Skipped posting 7 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. migrate_to_from_provider.py:87

Draft comment:
Pattern 10 replacement uses DEFAULT_MODELS["google"] for vertexai; please verify if this is intended or if a dedicated vertexai default should be used.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. migrate_to_from_provider.py:31

Draft comment:
Consider using named regex groups for clarity in pattern definitions (e.g. for capturing mode) to improve readability and maintainability.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 85% None

3. migrate_to_from_provider.py:129

Draft comment:
The change counter logic (comparing counts of 'from_provider') may not robustly reflect actual migration changes; consider a more reliable approach.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 85% None

4. migrate_to_from_provider.py:101

Draft comment:
The import cleanup regex only checks for openai, anthropic, groq, and cohere; consider extending it to include providers like cerebras and fireworks that are present in DEFAULT_MODELS.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

5. migrate_to_from_provider.py:29

Draft comment:
Consider pre-compiling regex patterns if processing many files to improve performance, especially in a migration script.
Reason this comment was not posted:
Confidence changes required: 30% <= threshold 85% None

6. docs/blog/posts/string-based-init.md:49

Draft comment:
Typo notice: The Anthropic provider string has changed to "anthropic/claude-3-5-haiku-latest". Please confirm that the inclusion of "3-5" is intentional and correct (it may be a typographical error compared to the previous "claude-3-haiku-20240307").
Reason this comment was not posted:
Comment looked like it was already resolved.

7. docs/integrations/anthropic.md:244

Draft comment:
Typographical suggestion: Consider changing "an PDF" to "a PDF" in the comment (line 244) for grammatical accuracy.
Reason this comment was not posted:
Comment was on unchanged code.

Workflow ID: wflow_JMjAQpu1bT6pPc0D

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

- Remove unused typing imports - Update Tuple → tuple (modern Python syntax) - Remove unused filename parameter - Run ruff format for consistent style All checks now pass.

The migration script has served its purpose (migrated 131 files). Removing it to keep the repo clean.

Add explicit guidance to run ruff and type checking before commits: - Ruff linting (uv run ruff check .) - Ruff formatting (uv run ruff format .) - Type checking (uv run ty check) - Relevant tests This ensures consistent code quality and catches issues early.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed a9bcf17 in 1 minute and 51 seconds. Click for details.

Reviewed 18 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. CLAUDE.md:169

Draft comment:
The new 'Code Quality Checks Before Committing' section is clear, but it duplicates the type checking instruction ('uv run ty check') that's already mentioned above. Consider consolidating to avoid redundancy.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 85% The comment identifies a real duplication - the same command appears twice. However, I need to consider whether this is problematic. The first mention (line 166) is in a subsection specifically about type checking with ty, providing context about the tool. The new section (lines 168-173) is a comprehensive pre-commit checklist that includes type checking along with other quality checks. This could be seen as intentional organization - one is educational/contextual, the other is a practical checklist. The comment suggests "consolidating" but doesn't specify how, which makes it less actionable. According to the rules, I should look for "code quality refactors" that are "actionable and clear." This comment is about documentation structure, not code, and the suggestion to "consolidate" is vague. The duplication might be intentional for different purposes - one section teaches about the type checking tool specifically, while the other provides a complete pre-commit workflow. The comment doesn't provide a clear, actionable suggestion for how to consolidate, making it more of an observation than a concrete improvement. While the critique about vagueness is valid, there is genuine redundancy here that could confuse readers. However, the comment lacks a specific proposal for how to fix it, and the duplication might serve different audiences (someone learning about ty vs someone looking for a quick pre-commit checklist). The rules state comments should be actionable and clear - this is somewhat vague. This comment identifies real duplication but lacks a clear, actionable suggestion for improvement. The duplication may serve different purposes (educational context vs practical checklist), and without a specific proposal for consolidation, this comment is more observational than actionable. Given the rules emphasize actionable and clear suggestions, this comment should be deleted.

Workflow ID: wflow_HwpmWT99GcsYtTSS

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

claude added 2 commits November 6, 2025 05:19

github-actions Bot added documentation Improvements or additions to documentation enhancement New feature or request python Pull requests that update python code labels Nov 6, 2025

ellipsis-dev Bot reviewed Nov 6, 2025

View reviewed changes

Comment thread migrate_to_from_provider.py Outdated

Comment thread docs/blog/posts/announcing-gemini-tool-calling-support.md Outdated

Comment thread docs/blog/posts/chat-with-your-pdf-with-gemini.md Outdated

Comment thread docs/blog/posts/multimodal-gemini.md Outdated

ellipsis-dev Bot reviewed Nov 6, 2025

View reviewed changes

claude added 3 commits November 6, 2025 05:31

style: fix ruff linting issues in migration script

5904168

- Remove unused typing imports - Update Tuple → tuple (modern Python syntax) - Remove unused filename parameter - Run ruff format for consistent style All checks now pass.

chore: remove migration script after completion

6fab405

The migration script has served its purpose (migrated 131 files). Removing it to keep the repo clean.

ellipsis-dev Bot reviewed Nov 6, 2025

View reviewed changes

jxnl and others added 3 commits November 5, 2025 21:44

Update docs/blog/posts/announcing-gemini-tool-calling-support.md

545caeb

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

Update docs/blog/posts/chat-with-your-pdf-with-gemini.md

12faed0

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

Update docs/blog/posts/multimodal-gemini.md

eae77ac

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

jxnl merged commit 1480675 into main Nov 6, 2025
8 of 12 checks passed

jxnl deleted the claude/standardize-provider-imports-011CUr4M4qjLfJq1LxpGuHzJ branch November 6, 2025 05:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Standardize provider imports in documentation#1896

Standardize provider imports in documentation#1896
jxnl merged 9 commits intomainfrom
claude/standardize-provider-imports-011CUr4M4qjLfJq1LxpGuHzJ

jxnl commented Nov 6, 2025 •

edited by ellipsis-dev Bot

Loading

Uh oh!

ellipsis-dev Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloudflare-workers-and-pages Bot commented Nov 6, 2025 •

edited

Loading

Uh oh!

ellipsis-dev Bot left a comment

Uh oh!

ellipsis-dev Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jxnl commented Nov 6, 2025 • edited by ellipsis-dev Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloudflare-workers-and-pages Bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jxnl commented Nov 6, 2025 •

edited by ellipsis-dev Bot

Loading

cloudflare-workers-and-pages Bot commented Nov 6, 2025 •

edited

Loading