Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
f3fcb9e
test(core): standardize provider tests with from_provider() parameter…
claude Nov 6, 2025
6fd61f5
feat(tests): consolidate providers into core test suite
claude Nov 6, 2025
74e2e56
docs(tests): add workflow update instructions for maintainers
claude Nov 6, 2025
c44fffa
fix(tests): update models to claude-haiku-4-5-latest and gemini-2.5-f…
claude Nov 6, 2025
c796c1c
fix(tests): complete model updates in util.py and README
claude Nov 6, 2025
7c1fc73
docs(tests): add comprehensive parameterization and provider-specific…
claude Nov 6, 2025
3866388
docs(tests): answer key questions about parameterization and provider…
claude Nov 6, 2025
c7cd45e
feat(tests): add unified multimodal tests to core suite
claude Nov 6, 2025
7f26778
refactor(tests): massive cleanup - delete all duplicate tests
claude Nov 6, 2025
2fbf2fc
Refactor: Update instructor modes for Fireworks and Perplexity
cursoragent Nov 6, 2025
eaf5a05
feat(tests): add unified multimodal tests to core suite
claude Nov 6, 2025
e6c6cf3
docs(tests): remove temporary analysis markdown files
claude Nov 6, 2025
04c8017
Refactor: Separate core provider tests and update test matrix
cursoragent Nov 6, 2025
afe8c14
refactor(tests): delete more duplicate test files
claude Nov 6, 2025
4f15c89
feat(xai): enhance tool handling and add capability definitions for p…
jxnl Nov 6, 2025
e5ce61a
fix(tests): stabilize core provider response modes
jxnl Nov 6, 2025
a3d0fc0
fix(ci): fix ruff linting errors and type check issues
jxnl Nov 6, 2025
515ac81
fix(types): add type ignores for xAI SDK method calls
jxnl Nov 6, 2025
8209d5a
fix(anthropic): respect strict JSON control character handling
jxnl Nov 6, 2025
de36d2b
Merge remote-tracking branch 'origin/main' into claude/standardize-fr…
jxnl Nov 12, 2025
5a6b0b2
refactor(tests): remove provider-specific tests and utility configura…
jxnl Nov 12, 2025
9ff3df0
fix(tests): update test commands to use asyncio mode
jxnl Nov 12, 2025
ad165b5
feat(tests): expand core provider tests for OpenAI, Anthropic, Google…
jxnl Nov 12, 2025
6da9110
fix(tests): skip unsupported provider capabilities for Google Gemini
jxnl Nov 12, 2025
ef2af12
docs(google): add known limitations as of Nov 12, 2024
jxnl Nov 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
228 changes: 228 additions & 0 deletions tests/llm/PROVIDER_TEST_REVIEW.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
# Provider Test Review - Consolidation Analysis

## Objective
Identify which provider tests can be consolidated into `test_core_providers/` and which should remain provider-specific.

## Analysis by Provider

### ✅ test_openai (13 test files)
**Test files:**
- test_attr.py
- test_hooks.py - **PROVIDER-SPECIFIC** (OpenAI hooks)
- test_modes.py
- test_multimodal.py - **PROVIDER-SPECIFIC** (OpenAI multimodal API)
- test_multitask.py
- test_openai.py
- test_parallel.py
- test_patch.py
- test_retries.py
- test_simple_types.py
- test_stream.py
- test_validation_context.py - **PROVIDER-SPECIFIC**
- test_validators.py

**Uses:** `from_openai()` mostly
**Recommendation:**
- Keep: hooks, multimodal, validation_context (provider-specific)
- Can delete: basic extraction, streaming, retries (now in core)

---

### ✅ test_anthropic (5 test files)
**Test files:**
- test_multimodal.py - **PROVIDER-SPECIFIC** (Anthropic multimodal API)
- test_parallel.py
- test_reasoning.py - **PROVIDER-SPECIFIC** (extended thinking)
- test_stream.py
- test_system.py - **PROVIDER-SPECIFIC** (system prompt handling)

**Uses:** `from_provider()` ✅
**Recommendation:**
- Keep: multimodal, reasoning, system (provider-specific features)
- Can delete: parallel, stream (now in core)

---

### ✅ test_genai (10 test files)
**Test files:**
- test_basics.py
- test_decimal.py - **PROVIDER-SPECIFIC** (decimal handling)
- test_format.py - **PROVIDER-SPECIFIC** (format handling)
- test_invalid_schema.py - **PROVIDER-SPECIFIC** (schema validation)
- test_multimodal.py - **PROVIDER-SPECIFIC** (Google multimodal API)
- test_response_model_none.py
- test_schema_conversion.py - **PROVIDER-SPECIFIC** (schema conversion)
- test_simple.py
- test_stream.py
- test_utils.py - **PROVIDER-SPECIFIC** (utilities)

**Uses:** `from_provider()` ✅
**Recommendation:**
- Keep: decimal, format, invalid_schema, multimodal, schema_conversion, utils
- Can delete: basics, simple, stream, response_model_none (now in core)

---

### ✅ test_gemini (6 test files + evals/)
**Test files:**
- test_list_content.py - **PROVIDER-SPECIFIC** (content format)
- test_multimodal_content.py - **PROVIDER-SPECIFIC** (multimodal)
- test_patch.py
- test_retries.py
- test_simple_types.py
- test_stream.py
- evals/ - **KEEP** (evaluation tests)

**Uses:** `from_provider()` ✅
**Recommendation:**
- Keep: list_content, multimodal_content, evals
- Can delete: patch, retries, simple_types, stream (now in core)

---

### ✅ test_cohere (3 test files)
**Test files:**
- test_json_schema.py - **PROVIDER-SPECIFIC** (JSON schema mode)
- test_none_response.py
- test_retries.py

**Uses:** `from_provider()` ✅
**Recommendation:**
- Keep: json_schema (provider-specific mode)
- Can delete: none_response, retries (now in core)

---

### ✅ test_xai (3 test files)
**Test files:**
- test_basics.py
- test_raw_response.py - **MAYBE KEEP** (raw response testing)
- test_stream.py

**Uses:** `from_provider()` ✅
**Recommendation:**
- Keep: raw_response (if provider-specific behavior)
- Can delete: basics, stream (now in core)

---

### ⚠️ test_mistral (4 test files)
**Test files:**
- test_modes.py
- test_multimodal.py - **PROVIDER-SPECIFIC** (Mistral multimodal)
- test_retries.py
- test_stream.py

**Uses:** `from_mistral()` ❌
**Recommendation:**
- ADD to core providers
- Keep: multimodal
- Can delete: modes, retries, stream after migration

---

### ⚠️ test_cerebras (1 test file)
**Test files:**
- modes.py (actually contains tests)

**Uses:** `from_cerebras()` ❌
**Recommendation:**
- ADD to core providers
- Tests are generic, can all go to core after migration

---

### ⚠️ test_fireworks (3 test files)
**Test files:**
- test_format.py
- test_simple.py
- test_stream.py

**Uses:** `from_fireworks()` ❌
**Recommendation:**
- ADD to core providers
- All tests are generic

---

### ⚠️ test_writer (4 test files + evals/)
**Test files:**
- test_format_common_models.py
- test_format_difficult_models.py
- test_retries.py
- test_streaming.py
- evals/ - **KEEP**

**Uses:** `from_writer()` ❌
**Recommendation:**
- ADD to core providers
- Keep: evals/
- Can delete: all test files after migration

---

### ⚠️ test_perplexity (1 test file)
**Test files:**
- test_modes.py

**Uses:** Unknown
**Recommendation:**
- ADD to core providers
- Test is generic

---

### ⚠️ test_bedrock (unknown)
**Recommendation:**
- Review separately (AWS complexity)

---

### ⚠️ test_vertexai (unknown)
**Recommendation:**
- Review separately (may be deprecated in favor of test_genai)

---

## Summary

### Can Add to Core (Need Migration)
- ✅ Mistral - change from_mistral() to from_provider()
- ✅ Cerebras - change from_cerebras() to from_provider()
- ✅ Fireworks - change from_fireworks() to from_provider()
- ✅ Writer - change from_writer() to from_provider()
- ✅ Perplexity - change from_perplexity() to from_provider()

### Provider-Specific to Keep
- **OpenAI:** hooks, multimodal, validation_context
- **Anthropic:** multimodal, reasoning, system
- **Google (genai):** decimal, format, invalid_schema, multimodal, schema_conversion, utils
- **Gemini:** list_content, multimodal_content, evals/
- **Cohere:** json_schema
- **xAI:** raw_response (maybe)
- **Mistral:** multimodal
- **Writer:** evals/

### Can Delete After Consolidation
- test_openai: attr, modes, multitask, openai, parallel, patch, retries, simple_types, stream
- test_anthropic: parallel, stream
- test_genai: basics, simple, stream, response_model_none
- test_gemini: patch, retries, simple_types, stream
- test_cohere: none_response, retries
- test_xai: basics, stream
- test_mistral: modes, retries, stream (after migration)
- test_cerebras: modes.py (after migration)
- test_fireworks: all (after migration)
- test_writer: all except evals/ (after migration)
- test_perplexity: all (after migration)

## Directories to Completely Remove
After migration, these can be deleted entirely:
- ❌ test_cerebras (move to core)
- ❌ test_fireworks (move to core)
- ❌ test_perplexity (move to core)

## Estimated Impact
- **Before:** ~72 test files across 14 provider directories
- **After:** ~25-30 provider-specific test files + shared core tests
- **Reduction:** ~40-50 test files eliminated (deduplicated)
105 changes: 105 additions & 0 deletions tests/llm/WORKFLOW_UPDATE_NEEDED.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# GitHub Actions Workflow Update Needed

The test consolidation requires updates to `.github/workflows/test.yml` that couldn't be pushed automatically due to permission restrictions.

## Required Changes to `.github/workflows/test.yml`

### 1. Add New Core Provider Tests Job

Add this new job after the `core-tests` job:

```yaml
# Core provider tests (unified tests across all providers)
core-provider-tests:
name: Core Provider Tests (All Providers)
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- name: Install uv
uses: astral-sh/setup-uv@v4
with:
enable-cache: true
- name: Set up Python
run: uv python install 3.11
- name: Install the project
run: uv sync --all-extras
- name: Run core provider tests
run: uv run pytest tests/llm/test_core_providers/ -n auto
env:
INSTRUCTOR_ENV: CI
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }}
XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
CEREBRAS_API_KEY: ${{ secrets.CEREBRAS_API_KEY }}
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
WRITER_API_KEY: ${{ secrets.WRITER_API_KEY }}
PERPLEXITY_API_KEY: ${{ secrets.PERPLEXITY_API_KEY }}
```

### 2. Update Provider-Specific Tests Job

Rename the `provider-tests` job to `provider-specific-tests` and update the matrix:

```yaml
# Provider-specific tests (features unique to each provider)
provider-specific-tests:
name: ${{ matrix.provider.name }} Specific Tests
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
provider:
- name: OpenAI
env_key: OPENAI_API_KEY
test_path: tests/llm/test_openai
- name: Anthropic
env_key: ANTHROPIC_API_KEY
test_path: tests/llm/test_anthropic
- name: Gemini
env_key: GOOGLE_API_KEY
test_path: tests/llm/test_gemini
- name: Google GenAI
env_key: GOOGLE_API_KEY
test_path: tests/llm/test_genai
- name: Cohere
env_key: COHERE_API_KEY
test_path: tests/llm/test_cohere
- name: XAI
env_key: XAI_API_KEY
test_path: tests/llm/test_xai
- name: Mistral
env_key: MISTRAL_API_KEY
test_path: tests/llm/test_mistral
- name: Writer
env_key: WRITER_API_KEY
test_path: tests/llm/test_writer
```

Note: Removed Cerebras, Fireworks, and Perplexity from the matrix since those test directories were deleted.

## Why These Changes Are Needed

1. **New core-provider-tests job**: Runs the unified test suite in `tests/llm/test_core_providers/` against all 10 providers simultaneously

2. **Updated provider-specific-tests**: Now only runs provider-specific feature tests (like multimodal, reasoning, etc.) for providers that have unique features

3. **Deleted providers**: Cerebras, Fireworks, and Perplexity test directories were removed since their tests are now in the core test suite

## Required GitHub Secrets

Ensure these secrets are configured in the repository (tests will skip gracefully if missing):

- `OPENAI_API_KEY`
- `ANTHROPIC_API_KEY`
- `GOOGLE_API_KEY`
- `COHERE_API_KEY`
- `XAI_API_KEY`
- `MISTRAL_API_KEY`
- `CEREBRAS_API_KEY`
- `FIREWORKS_API_KEY`
- `WRITER_API_KEY`
- `PERPLEXITY_API_KEY`
Loading
Loading