Skip to content
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
f3fcb9e
test(core): standardize provider tests with from_provider() parameter…
claude Nov 6, 2025
6fd61f5
feat(tests): consolidate providers into core test suite
claude Nov 6, 2025
74e2e56
docs(tests): add workflow update instructions for maintainers
claude Nov 6, 2025
c44fffa
fix(tests): update models to claude-haiku-4-5-latest and gemini-2.5-f…
claude Nov 6, 2025
c796c1c
fix(tests): complete model updates in util.py and README
claude Nov 6, 2025
7c1fc73
docs(tests): add comprehensive parameterization and provider-specific…
claude Nov 6, 2025
3866388
docs(tests): answer key questions about parameterization and provider…
claude Nov 6, 2025
c7cd45e
feat(tests): add unified multimodal tests to core suite
claude Nov 6, 2025
7f26778
refactor(tests): massive cleanup - delete all duplicate tests
claude Nov 6, 2025
2fbf2fc
Refactor: Update instructor modes for Fireworks and Perplexity
cursoragent Nov 6, 2025
eaf5a05
feat(tests): add unified multimodal tests to core suite
claude Nov 6, 2025
e6c6cf3
docs(tests): remove temporary analysis markdown files
claude Nov 6, 2025
04c8017
Refactor: Separate core provider tests and update test matrix
cursoragent Nov 6, 2025
afe8c14
refactor(tests): delete more duplicate test files
claude Nov 6, 2025
4f15c89
feat(xai): enhance tool handling and add capability definitions for p…
jxnl Nov 6, 2025
e5ce61a
fix(tests): stabilize core provider response modes
jxnl Nov 6, 2025
a3d0fc0
fix(ci): fix ruff linting errors and type check issues
jxnl Nov 6, 2025
515ac81
fix(types): add type ignores for xAI SDK method calls
jxnl Nov 6, 2025
8209d5a
fix(anthropic): respect strict JSON control character handling
jxnl Nov 6, 2025
de36d2b
Merge remote-tracking branch 'origin/main' into claude/standardize-fr…
jxnl Nov 12, 2025
5a6b0b2
refactor(tests): remove provider-specific tests and utility configura…
jxnl Nov 12, 2025
9ff3df0
fix(tests): update test commands to use asyncio mode
jxnl Nov 12, 2025
ad165b5
feat(tests): expand core provider tests for OpenAI, Anthropic, Google…
jxnl Nov 12, 2025
6da9110
fix(tests): skip unsupported provider capabilities for Google Gemini
jxnl Nov 12, 2025
ef2af12
docs(google): add known limitations as of Nov 12, 2024
jxnl Nov 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 45 additions & 9 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,11 @@ jobs:
- name: Install the project
run: uv sync --all-extras
- name: Run core tests
run: uv run pytest tests/ -k 'not llm and not openai and not gemini and not anthropic and not cohere and not vertexai and not mistral and not xai and not docs'
run: >-
uv run pytest tests/ -n auto
-k 'not test_core_providers and not test_openai and not test_anthropic
and not test_gemini and not test_genai and not test_writer and not
test_vertexai and not docs'
env:
INSTRUCTOR_ENV: CI
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
Expand All @@ -31,15 +35,46 @@ jobs:
XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}

core-provider-tests:
name: Core Provider Tests
runs-on: ubuntu-latest
needs: core-tests

steps:
- uses: actions/checkout@v2
- name: Install uv
uses: astral-sh/setup-uv@v4
with:
enable-cache: true
- name: Set up Python
run: uv python install 3.11
- name: Install the project
run: uv sync --all-extras
- name: Run core provider tests
run: uv run pytest tests/llm/test_core_providers -v -n auto
env:
INSTRUCTOR_ENV: CI
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }}
XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
CEREBRAS_API_KEY: ${{ secrets.CEREBRAS_API_KEY }}
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
WRITER_API_KEY: ${{ secrets.WRITER_API_KEY }}
PERPLEXITY_API_KEY: ${{ secrets.PERPLEXITY_API_KEY }}

# Provider tests run in parallel
provider-tests:
name: ${{ matrix.provider.name }} Tests
runs-on: ubuntu-latest
needs: core-provider-tests
strategy:
fail-fast: false
matrix:
provider:
- name: Openai
- name: OpenAI
env_key: OPENAI_API_KEY
test_path: tests/llm/test_openai
- name: Anthropic
Expand All @@ -51,12 +86,12 @@ jobs:
- name: Google GenAI
env_key: GOOGLE_API_KEY
test_path: tests/llm/test_genai
- name: Cohere
env_key: COHERE_API_KEY
test_path: tests/llm/test_cohere
- name: XAI
env_key: XAI_API_KEY
test_path: tests/llm/test_xai
- name: Vertex AI
env_key: GOOGLE_API_KEY
test_path: tests/llm/test_vertexai
- name: Writer
env_key: WRITER_API_KEY
test_path: tests/llm/test_writer

steps:
- uses: actions/checkout@v2
Expand All @@ -78,6 +113,7 @@ jobs:
auto-client-test:
name: Auto Client Tests
runs-on: ubuntu-latest
needs: provider-tests

steps:
- uses: actions/checkout@v2
Expand All @@ -90,7 +126,7 @@ jobs:
- name: Install the project
run: uv sync --all-extras
- name: Run Auto Client tests
run: uv run pytest tests/test_auto_client.py
run: uv run pytest tests/test_auto_client.py -n auto
env:
INSTRUCTOR_ENV: CI
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
Expand Down
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

## Commands
- Install deps: `uv pip install -e ".[dev,anthropic]"` or `poetry install --with dev,anthropic`
- Run tests: `uv run pytest tests/`
- Run tests: `uv run pytest tests/ -n auto`
- Run specific test: `uv run pytest tests/path_to_test.py::test_name`
- Skip LLM tests: `uv run pytest tests/ -k 'not llm and not openai'`
- Type check: `uv run ty check`
Expand Down
7 changes: 2 additions & 5 deletions instructor/processing/function_calls.py
Original file line number Diff line number Diff line change
Expand Up @@ -399,10 +399,7 @@ def parse_anthropic_json(
# read: https://docs.anthropic.com/en/docs/build-with-claude/tool-use/web-search-tool#response
text_blocks = [c for c in completion.content if c.type == "text"]
last_block = text_blocks[-1]
# Strip raw control characters (0x00-0x1F) that would cause json.loads to fail
# Note: This preserves escaped sequences like \n in JSON strings, which are handled
# correctly by the JSON parser. Only raw, unescaped control bytes are removed.
text = re.sub(r"[\u0000-\u001F]", "", last_block.text)
text = last_block.text

extra_text = extract_json_from_codeblock(text)

Expand All @@ -411,7 +408,7 @@ def parse_anthropic_json(
extra_text, context=validation_context, strict=True
)
else:
# Allow control characters.
# Allow control characters to pass through by using the non-strict JSON parser.
parsed = json.loads(extra_text, strict=False)
# Pydantic non-strict: https://docs.pydantic.dev/latest/concepts/strict_mode/
model = cls.model_validate(parsed, context=validation_context, strict=False)
Expand Down
106 changes: 84 additions & 22 deletions instructor/providers/xai/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ async def acreate(
chat = client.chat.create(model=model, messages=x_messages, **call_kwargs)

if response_model is None:
resp = await chat.sample()
resp = await chat.sample() # type: ignore[misc]
return resp

assert response_model is not None
Expand All @@ -135,7 +135,7 @@ async def acreate(
schema=json.dumps(_get_model_schema(response_model)),
)
)
json_chunks = (chunk.content async for _, chunk in chat.stream())
json_chunks = (chunk.content async for _, chunk in chat.stream()) # type: ignore[misc]
# response_model is guaranteed to be a type[BaseModel] at this point due to earlier assertion
rm = cast(type[BaseModel], response_model)
if issubclass(rm, IterableBase):
Expand All @@ -147,22 +147,24 @@ async def acreate(
f"Unsupported response model type for streaming: {_get_model_name(response_model)}"
)
else:
raw, parsed = await chat.parse(response_model)
raw, parsed = await chat.parse(response_model) # type: ignore[misc]
parsed._raw_response = raw
return parsed
else:
tool = xchat.tool(
tool_obj = xchat.tool(
name=_get_model_name(response_model),
description=response_model.__doc__ or "",
parameters=_get_model_schema(response_model),
)
chat.proto.tools.append(tool)
chat.proto.tool_choice.mode = xchat.chat_pb2.ToolMode.TOOL_MODE_AUTO
chat.proto.tools.append(tool_obj) # type: ignore[arg-type]
tool_name = tool_obj.function.name # type: ignore[attr-defined]
chat.proto.tool_choice.CopyFrom(xchat.required_tool(tool_name))
if is_stream:
stream_iter = chat.stream() # type: ignore[misc]
args = (
resp.tool_calls[0].function.arguments
async for resp, _ in chat.stream()
if resp.tool_calls and resp.finish_reason == "REASON_INVALID"
resp.tool_calls[0].function.arguments # type: ignore[index,attr-defined]
async for resp, _ in stream_iter # type: ignore[assignment]
if resp.tool_calls and resp.finish_reason == "REASON_INVALID" # type: ignore[attr-defined]
)
rm = cast(type[BaseModel], response_model)
if issubclass(rm, IterableBase):
Expand All @@ -174,8 +176,37 @@ async def acreate(
f"Unsupported response model type for streaming: {_get_model_name(response_model)}"
)
else:
resp = await chat.sample()
args = resp.tool_calls[0].function.arguments
resp = await chat.sample() # type: ignore[misc]
if not resp.tool_calls: # type: ignore[attr-defined]
# If no tool calls, try to extract from text content
from ...processing.function_calls import _validate_model_from_json
from ...utils import extract_json_from_codeblock

# Try to extract JSON from text content
text_content: str = ""
if hasattr(resp, "text") and resp.text: # type: ignore[attr-defined]
text_content = str(resp.text) # type: ignore[attr-defined]
elif hasattr(resp, "content") and resp.content: # type: ignore[attr-defined]
content = resp.content # type: ignore[attr-defined]
if isinstance(content, str):
text_content = content
elif isinstance(content, list) and content:
text_content = str(content[0])

if text_content:
json_str = extract_json_from_codeblock(text_content)
parsed = _validate_model_from_json(
response_model, json_str, None, strict
)
parsed._raw_response = resp
return parsed

raise ValueError(
f"No tool calls returned from xAI and no text content available. "
f"Response: {resp}"
)

args = resp.tool_calls[0].function.arguments # type: ignore[index,attr-defined]
from ...processing.function_calls import _validate_model_from_json

parsed = _validate_model_from_json(response_model, args, None, strict)
Expand All @@ -201,7 +232,7 @@ def create(
chat = client.chat.create(model=model, messages=x_messages, **call_kwargs)

if response_model is None:
resp = chat.sample()
resp = chat.sample() # type: ignore[misc]
return resp

assert response_model is not None
Expand All @@ -218,7 +249,7 @@ def create(
schema=json.dumps(_get_model_schema(response_model)),
)
)
json_chunks = (chunk.content for _, chunk in chat.stream())
json_chunks = (chunk.content for _, chunk in chat.stream()) # type: ignore[misc]
rm = cast(type[BaseModel], response_model)
if issubclass(rm, IterableBase):
return rm.tasks_from_chunks(json_chunks)
Expand All @@ -229,24 +260,26 @@ def create(
f"Unsupported response model type for streaming: {_get_model_name(response_model)}"
)
else:
raw, parsed = chat.parse(response_model)
raw, parsed = chat.parse(response_model) # type: ignore[misc]
parsed._raw_response = raw
return parsed
else:
tool = xchat.tool(
tool_obj = xchat.tool(
name=_get_model_name(response_model),
description=response_model.__doc__ or "",
parameters=_get_model_schema(response_model),
)
chat.proto.tools.append(tool)
chat.proto.tool_choice.mode = xchat.chat_pb2.ToolMode.TOOL_MODE_AUTO
chat.proto.tools.append(tool_obj) # type: ignore[arg-type]
tool_name = tool_obj.function.name # type: ignore[attr-defined]
chat.proto.tool_choice.CopyFrom(xchat.required_tool(tool_name))
if is_stream:
for resp, _ in chat.stream():
stream_iter = chat.stream() # type: ignore[misc]
for resp, _ in stream_iter: # type: ignore[assignment]
# For xAI, tool_calls are returned at the end of the response.
# Effectively, it is not a streaming response.
# See: https://docs.x.ai/docs/guides/function-calling
if resp.tool_calls:
args = resp.tool_calls[0].function.arguments
if resp.tool_calls: # type: ignore[attr-defined]
args = resp.tool_calls[0].function.arguments # type: ignore[index,attr-defined]
rm = cast(type[BaseModel], response_model)
if issubclass(rm, IterableBase):
return rm.tasks_from_chunks(args)
Expand All @@ -257,8 +290,37 @@ def create(
f"Unsupported response model type for streaming: {_get_model_name(response_model)}"
)
else:
resp = chat.sample()
args = resp.tool_calls[0].function.arguments
resp = chat.sample() # type: ignore[misc]
if not resp.tool_calls: # type: ignore[attr-defined]
# If no tool calls, try to extract from text content
from ...processing.function_calls import _validate_model_from_json
from ...utils import extract_json_from_codeblock

# Try to extract JSON from text content
text_content: str = ""
if hasattr(resp, "text") and resp.text: # type: ignore[attr-defined]
text_content = str(resp.text) # type: ignore[attr-defined]
elif hasattr(resp, "content") and resp.content: # type: ignore[attr-defined]
content = resp.content # type: ignore[attr-defined]
if isinstance(content, str):
text_content = content
elif isinstance(content, list) and content:
text_content = str(content[0])

if text_content:
json_str = extract_json_from_codeblock(text_content)
parsed = _validate_model_from_json(
response_model, json_str, None, strict
)
parsed._raw_response = resp
return parsed

raise ValueError(
f"No tool calls returned from xAI and no text content available. "
f"Response: {resp}"
)

args = resp.tool_calls[0].function.arguments # type: ignore[index,attr-defined]
from ...processing.function_calls import _validate_model_from_json

parsed = _validate_model_from_json(response_model, args, None, strict)
Expand Down
Loading
Loading