Skip to content

Conversation

@samdickson22
Copy link

@samdickson22 samdickson22 commented Oct 30, 2025

What

Adds Cerebras as a new AI provider to Jan with support for 8 models, including Llama 4 Scout, Llama 3.3 70B, GPT OSS 120B, and Qwen variants.

Why

Cerebras offers ultra-fast AI inference (2000-3000 tokens/s) with OpenAI-compatible endpoints, making it an excellent addition to Jan's provider ecosystem. This integration enables users to leverage Cerebras's high-performance models directly through Jan's interface.

How

Provider Configuration:

  • Added Cerebras to predefinedProviders array in web-app/src/consts/providers.ts
  • Configured OpenAI-compatible endpoint: https://api.cerebras.ai/v1
  • Added 8 models with proper capability flags:
    • Production models: Llama 4 Scout (109B), Llama 3.1 8B, Llama 3.3 70B, GPT OSS 120B, Qwen 3 32B
    • Preview models: Qwen 3 235B Instruct, Qwen 3 235B Thinking, Qwen 3 Coder 480B
  • Tool calling support

Visual Assets:

  • Added Cerebras logo (PNG) from LobeHub icon collection
  • Updated getProviderLogo() function in web-app/src/lib/utils.ts

Documentation:

  • Created comprehensive setup guide at docs/src/pages/docs/desktop/remote-models/cerebras.mdx
  • Includes model descriptions with performance specs
  • Documents features (streaming, tool calling)
  • Lists limitations (unsupported OpenAI parameters)
  • Provides troubleshooting section
  • Added navigation entry in _meta.json

Technical Approach:
This is a configuration-driven integration that leverages Jan's existing OpenAI-compatible provider infrastructure. No custom code or API handlers needed - everything works through the standard token.js fallback mechanism.

Testing

Manual Testing Required:

  • Provider appears in Settings > Providers
  • Toggle functionality works
  • API key configuration saves correctly
  • Models can be fetched from Cerebras API
  • Chat completions work with streaming
  • Tool calling works
  • Documentation renders correctly
  • All documentation links work

Automated Testing:
TypeScript compilation verified - no errors in modified files.

Breaking Changes

None. This is a purely additive change that doesn't modify existing provider behavior.

Files Changed

  • web-app/src/consts/providers.ts - Provider configuration (+91 lines)
  • web-app/src/lib/utils.ts - Logo reference (+2 lines)
  • web-app/public/images/model-provider/cerebras.png - Provider logo (53KB PNG)
  • docs/src/pages/docs/desktop/remote-models/cerebras.mdx - Documentation (new file)
  • docs/src/pages/docs/desktop/remote-models/_meta.json - Navigation metadata (+3 lines)

Additional Notes

Model performance specs based on Cerebras documentation:

  • Llama 4 Scout: ~2600 tokens/s (deprecating Nov 3, 2025)
  • GPT OSS 120B: ~3000 tokens/s (fastest)
  • Preview models: 1400-2000 tokens/s (evaluation only, scheduled deprecation)

API compatibility: OpenAI-compatible but does not support frequency_penalty, logit_bias, presence_penalty, parallel_tool_calls, or service_tier.

Add Cerebras as a new AI provider with:
- OpenAI-compatible API endpoint (https://api.cerebras.ai/v1)
- 8 models including Llama 4 Scout, Llama 3.3 70B, GPT OSS 120B, and Qwen variants
- Tool calling support for gpt-oss-120b and llama-3.3-70b
- Ultra-fast inference speeds (2000-3000 tokens/s)
- Complete documentation with setup guide and troubleshooting
@samdickson22 samdickson22 changed the base branch from main to dev October 30, 2025 00:45
All 8 Cerebras models support tool calling according to their official
documentation. Updated capabilities to include 'tools' for:
- llama-4-scout-17b-16e-instruct
- llama3.1-8b
- qwen-3-32b
- qwen-3-235b-a22b-instruct-2507
- qwen-3-235b-a22b-thinking-2507
- qwen-3-coder-480b

Also corrected Llama 4 Scout parameter count from 109B to 17B.
…n all models

- Corrected Llama 4 Scout parameter count from 109B to 17B
- Added tool calling support notation for all 8 models
- Updated Features section to list all models with tool calling capability
Disable tool calling for 5 Cerebras models that reject JSON schema
validation fields (minimum, maximum, default). Only gpt-oss-120b,
llama-3.3-70b, and qwen-3-coder-480b support tools with Jan's RAG
tool schemas.

Root cause: Models have inconsistent JSON schema validation strictness.
Most models reject requests containing unsupported fields like minimum/maximum
in tool parameter schemas, while 3 models are more lenient.

Error returned by strict models:
"Unsupported JSON schema fields: {'maximum', 'minimum'}"

Models with tools disabled:
- llama-4-scout-17b-16e-instruct
- llama3.1-8b
- qwen-3-32b
- qwen-3-235b-a22b-instruct-2507
- qwen-3-235b-a22b-thinking-2507
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant