LLM Proxy System (Java)

A Spring Boot-based proxy system for routing requests to multiple Large Language Models (LLMs) including OpenAI, Gemini, Mistral, and Claude.

Features

Dynamic routing to multiple LLM providers
Model selection based on task type and availability
Comprehensive error handling with retries and fallbacks
Detailed structured logging for requests, responses, and errors
Caching for frequently requested queries
Simple web UI for testing and interaction
Docker support for containerized deployment

Error Handling Features

Timeout handling with automatic retries
Rate-limiting detection and handling
Fallback to alternative models when errors occur
Graceful handling of API errors with user-friendly messages
Exponential backoff with jitter for retries

Logging Features

Structured JSON logging for easy analysis
Detailed request logging (model, timestamp, query)
Comprehensive response logging (model, response time, tokens, status code)
Error logging with error types and details
Request ID tracking across the system
Token usage tracking and logging

Prerequisites

Java 17 or higher
Maven
Docker (optional, for containerized deployment)

Configuration

Create a .env file in the root directory with the following variables:

# API Keys for LLM providers
OPENAI_API_KEY=your_openai_api_key
GEMINI_API_KEY=your_gemini_api_key
MISTRAL_API_KEY=your_mistral_api_key
CLAUDE_API_KEY=your_claude_api_key

Running Locally

# Build and run
mvn clean install
mvn spring-boot:run

Docker Deployment

Building the Docker Image

docker build -t llmproxy-java .

Running with Docker

docker run -p 8080:8080 \
  -e OPENAI_API_KEY=your_openai_api_key \
  -e GEMINI_API_KEY=your_gemini_api_key \
  -e MISTRAL_API_KEY=your_mistral_api_key \
  -e CLAUDE_API_KEY=your_claude_api_key \
  llmproxy-java

Using Docker Compose

# Create a .env file with your API keys first
docker-compose up -d

API Endpoints

POST /api/query: Send a query to an LLM

Request body:

{
  "query": "Your query text",
  "model": "OPENAI|GEMINI|MISTRAL|CLAUDE", // Optional
  "modelVersion": "gpt-4o|gemini-1.5-pro|mistral-large-latest|claude-3-sonnet-20240229|...", // Optional
  "taskType": "TEXT_GENERATION|SUMMARIZATION|SENTIMENT_ANALYSIS|QUESTION_ANSWERING", // Optional
  "requestId": "optional-request-id-for-tracking" // Optional
}

GET /api/status: Check the status of all LLM providers

Web UI

Access the web UI at http://localhost:8080

Token Usage Tracking

The LLM Proxy System tracks token usage for all LLM providers:

Detailed Token Breakdown: Tracks input tokens, output tokens, and total tokens for each request
Provider-Specific Implementation:
- OpenAI: Uses the detailed token information provided in the API response (supports gpt-4o, gpt-4-turbo, etc.)
- Mistral: Uses the detailed token information provided in the API response (supports mistral-large-latest, codestral-latest, etc.)
- Claude: Uses the input and output token counts from the API response (supports claude-3-opus-20240229, claude-3-sonnet-20240229, etc.)
- Gemini: Uses token information when available, falls back to estimation (supports gemini-1.5-pro, gemini-2.0-flash, etc.)
Token Estimation: For providers with limited token information, the system estimates token usage based on input/output text length
UI Display: Token usage is displayed in a dedicated section in the web UI
Logging: Token usage is included in structured logs for monitoring and analysis

Architecture

The LLM Proxy System is built with a modular architecture:

Configuration: Environment variables for API keys and settings
Models: Data structures for requests and responses
Exceptions: Standardized error types and handling
Retry: Configurable retry mechanism with exponential backoff
Caching: In-memory caching for frequently requested queries
Logging: Structured logging for requests, responses, and errors
LLM Clients: Separate clients for each LLM provider with error handling
Router: Dynamic routing based on task type and availability with fallbacks
API Controllers: RESTful API endpoints for queries and status
Web UI: Simple interface for testing and interaction

Testing

The system includes comprehensive unit and functional tests:

Unit Tests: Test individual components in isolation
- LLM clients
- Router service
- Cache service
- Rate limiter
- Token estimator
Functional Tests: Test the integration between components
- API endpoints
- End-to-end flow
Integration Tests: Test interactions with external services
- LLM API interactions (using WireMock)

Run the tests with:

mvn test

Development

To contribute to this project:

Fork the repository
Create a feature branch
Make your changes
Run tests
Submit a pull request

Docker Development

For Docker-based development, see the Docker Guide for detailed instructions.

Supported Models

The system supports the following models:

OpenAI

gpt-4o
gpt-4o-mini
gpt-4-turbo
gpt-4
gpt-4-vision-preview
gpt-3.5-turbo
gpt-3.5-turbo-16k

Gemini

gemini-2.5-flash-preview-04-17
gemini-2.5-pro-preview-03-25
gemini-2.0-flash
gemini-2.0-flash-lite
gemini-1.5-flash
gemini-1.5-flash-8b
gemini-1.5-pro
gemini-pro
gemini-pro-vision

Mistral

codestral-latest
mistral-large-latest
mistral-saba-latest
mistral-tiny
mistral-small
mistral-medium
mistral-large

Claude

claude-3-opus-20240229
claude-3-sonnet-20240229
claude-3-haiku-20240307
claude-3-opus
claude-3-sonnet
claude-3-haiku
claude-2.1
claude-2.0

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
DOCKER.md		DOCKER.md
Dockerfile		Dockerfile
OPTIMIZATIONS.md		OPTIMIZATIONS.md
README.md		README.md
TECHNICAL_DEBT.md		TECHNICAL_DEBT.md
docker-compose.yml		docker-compose.yml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Proxy System (Java)

Features

Error Handling Features

Logging Features

Prerequisites

Configuration

Running Locally

Docker Deployment

Building the Docker Image

Running with Docker

Using Docker Compose

API Endpoints

Web UI

Token Usage Tracking

Architecture

Testing

Development

Docker Development

Supported Models

OpenAI

Gemini

Mistral

Claude

License

About

Uh oh!

Releases

Packages

Languages

mike-turintech/llmproxy-java

Folders and files

Latest commit

History

Repository files navigation

LLM Proxy System (Java)

Features

Error Handling Features

Logging Features

Prerequisites

Configuration

Running Locally

Docker Deployment

Building the Docker Image

Running with Docker

Using Docker Compose

API Endpoints

Web UI

Token Usage Tracking

Architecture

Testing

Development

Docker Development

Supported Models

OpenAI

Gemini

Mistral

Claude

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages