A Spring Boot-based proxy system for routing requests to multiple Large Language Models (LLMs) including OpenAI, Gemini, Mistral, and Claude.
- Dynamic routing to multiple LLM providers
- Model selection based on task type and availability
- Comprehensive error handling with retries and fallbacks
- Detailed structured logging for requests, responses, and errors
- Caching for frequently requested queries
- Simple web UI for testing and interaction
- Docker support for containerized deployment
- Timeout handling with automatic retries
- Rate-limiting detection and handling
- Fallback to alternative models when errors occur
- Graceful handling of API errors with user-friendly messages
- Exponential backoff with jitter for retries
- Structured JSON logging for easy analysis
- Detailed request logging (model, timestamp, query)
- Comprehensive response logging (model, response time, tokens, status code)
- Error logging with error types and details
- Request ID tracking across the system
- Token usage tracking and logging
- Java 17 or higher
- Maven
- Docker (optional, for containerized deployment)
Create a .env file in the root directory with the following variables:
# API Keys for LLM providers
OPENAI_API_KEY=your_openai_api_key
GEMINI_API_KEY=your_gemini_api_key
MISTRAL_API_KEY=your_mistral_api_key
CLAUDE_API_KEY=your_claude_api_key
# Build and run
mvn clean install
mvn spring-boot:rundocker build -t llmproxy-java .docker run -p 8080:8080 \
-e OPENAI_API_KEY=your_openai_api_key \
-e GEMINI_API_KEY=your_gemini_api_key \
-e MISTRAL_API_KEY=your_mistral_api_key \
-e CLAUDE_API_KEY=your_claude_api_key \
llmproxy-java# Create a .env file with your API keys first
docker-compose up -d-
POST /api/query: Send a query to an LLM- Request body:
{ "query": "Your query text", "model": "OPENAI|GEMINI|MISTRAL|CLAUDE", // Optional "modelVersion": "gpt-4o|gemini-1.5-pro|mistral-large-latest|claude-3-sonnet-20240229|...", // Optional "taskType": "TEXT_GENERATION|SUMMARIZATION|SENTIMENT_ANALYSIS|QUESTION_ANSWERING", // Optional "requestId": "optional-request-id-for-tracking" // Optional }
- Request body:
-
GET /api/status: Check the status of all LLM providers
Access the web UI at http://localhost:8080
The LLM Proxy System tracks token usage for all LLM providers:
- Detailed Token Breakdown: Tracks input tokens, output tokens, and total tokens for each request
- Provider-Specific Implementation:
- OpenAI: Uses the detailed token information provided in the API response (supports gpt-4o, gpt-4-turbo, etc.)
- Mistral: Uses the detailed token information provided in the API response (supports mistral-large-latest, codestral-latest, etc.)
- Claude: Uses the input and output token counts from the API response (supports claude-3-opus-20240229, claude-3-sonnet-20240229, etc.)
- Gemini: Uses token information when available, falls back to estimation (supports gemini-1.5-pro, gemini-2.0-flash, etc.)
- Token Estimation: For providers with limited token information, the system estimates token usage based on input/output text length
- UI Display: Token usage is displayed in a dedicated section in the web UI
- Logging: Token usage is included in structured logs for monitoring and analysis
The LLM Proxy System is built with a modular architecture:
- Configuration: Environment variables for API keys and settings
- Models: Data structures for requests and responses
- Exceptions: Standardized error types and handling
- Retry: Configurable retry mechanism with exponential backoff
- Caching: In-memory caching for frequently requested queries
- Logging: Structured logging for requests, responses, and errors
- LLM Clients: Separate clients for each LLM provider with error handling
- Router: Dynamic routing based on task type and availability with fallbacks
- API Controllers: RESTful API endpoints for queries and status
- Web UI: Simple interface for testing and interaction
The system includes comprehensive unit and functional tests:
-
Unit Tests: Test individual components in isolation
- LLM clients
- Router service
- Cache service
- Rate limiter
- Token estimator
-
Functional Tests: Test the integration between components
- API endpoints
- End-to-end flow
-
Integration Tests: Test interactions with external services
- LLM API interactions (using WireMock)
Run the tests with:
mvn testTo contribute to this project:
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests
- Submit a pull request
For Docker-based development, see the Docker Guide for detailed instructions.
The system supports the following models:
- gpt-4o
- gpt-4o-mini
- gpt-4-turbo
- gpt-4
- gpt-4-vision-preview
- gpt-3.5-turbo
- gpt-3.5-turbo-16k
- gemini-2.5-flash-preview-04-17
- gemini-2.5-pro-preview-03-25
- gemini-2.0-flash
- gemini-2.0-flash-lite
- gemini-1.5-flash
- gemini-1.5-flash-8b
- gemini-1.5-pro
- gemini-pro
- gemini-pro-vision
- codestral-latest
- mistral-large-latest
- mistral-saba-latest
- mistral-tiny
- mistral-small
- mistral-medium
- mistral-large
- claude-3-opus-20240229
- claude-3-sonnet-20240229
- claude-3-haiku-20240307
- claude-3-opus
- claude-3-sonnet
- claude-3-haiku
- claude-2.1
- claude-2.0
MIT