This FastAPI-based proxy allows applications designed for Ollama to communicate seamlessly with LM Studio. It intercepts Ollama-style API requests and transforms them into OpenAI-compatible requests for LM Studio.
This was designed specifically for the Goose AI agent, but can be adapted to work with other applications.
- Ollama-Compatible API: Your app can send requests as if it were talking to Ollama.
- Automatic Translation: Converts requests to LM Studio format.
- Streaming Support: Handles both normal and streaming responses.
- Non-Blocking Execution: Uses FastAPI and
httpxfor efficient async handling.
Goose-LMstudio-demo.mov
-
Clone the Repository (If Applicable)
git clone https://github.com/Embedded-Nature/ollama-proxy.git cd ollama-proxy -
Set Up a Virtual Environment (Optional but Recommended)
python -m venv venv source venv/bin/activate # macOS/Linux venv\Scripts\activate # Windows
-
Install Dependencies
pip install fastapi uvicorn httpx
Start the FastAPI server with:
uvicorn ollama_proxy:app --host 0.0.0.0 --port 11434 --reloadThis makes the proxy listen on port 11434, which is the default for Ollama.
Your application can now send requests to http://localhost:11434/api/generate, and they will be converted to work with LM Studio.
curl -X POST "http://localhost:11434/api/generate" \
-H "Content-Type: application/json" \
-d '{"model": "mistral", "prompt": "Tell me a story.", "stream": true}'Once upon a time...
The brave knight ventured...
- Modify the LM Studio API URL in
ollama_proxy.pyif needed:LM_STUDIO_API = "http://localhost:1234/v1/completions"
- Adjust Timeout in the script (
120sdefault) if needed. - Supports Temperature & Max Tokens adjustments in the API call.
- Make sure LM Studio is running and accepting requests on
http://localhost:1234/v1/completions. - Test LM Studio separately:
curl -X POST "http://localhost:1234/v1/completions" -H "Content-Type: application/json" -d '{"prompt": "Hello"}'
- Ensure
uvicornis installed:pip install uvicorn
- Check if the port is already in use:
If another process is using the port, restart your machine or change the port in
netstat -an | grep 11434uvicorn.
- Increase timeout in
ollama_proxy.py:async with httpx.AsyncClient(timeout=300) as client:
- Ensure LM Studio is not overloaded with too many requests.
- Deploy: Run the proxy as a system service or Docker container.
- Enhancements: Add API key authentication or logging.
🚀 Now, your app can use LM Studio while thinking it's talking to Ollama!