Skip to content

eventanilha82/oci-openai-proxy

Repository files navigation

OCI → OpenAI Proxy

🇧🇷 Português

Visão geral

Proxy FastAPI que traduz chamadas no estilo OpenAI (/v1/chat/completions, /v1/responses) para a API de inferência da OCI. O serviço mantém métricas simples, suporta streaming SSE e replica a aparência das respostas originais da OpenAI.

Pré-requisitos

  • Python 3.14 com uv instalado (uv tool install uv)
  • Credenciais configuradas no ~/.oci/config (com key_file=~/.oci/oci_api_key.pem)
  • Variáveis em .env (veja .env.example ou crie o arquivo conforme abaixo):
    OCI_LOCAL_OCI_DIR=~/.oci
    OCI_CONFIG_FILE=~/.oci/config
    OCI_CONFIG_PROFILE=DEFAULT
    OCI_COMPARTMENT_ID=<seu_compartment_ocid>
    OCI_ENDPOINT=https://inference.generativeai.us-chicago-1.oci.oraclecloud.com
    OCI_MAX_CONNECTIONS=1000
    OCI_MAX_KEEPALIVE=200
    OCI_REQUEST_TIMEOUT=60.0
    

Rodando com uv

uv run uvicorn proxy:app --reload

O servidor expõe http://127.0.0.1:8000. Ajuste OCI_* no .env para trocar de tenant, endpoint ou limites.

Rodando com Docker / Docker Compose

docker compose build
docker compose up

O Compose carrega env_file: .env automaticamente e monta ~/.oci em /root/.oci, permitindo que o proxy leia a mesma configuração da CLI. Para executar apenas com Docker:

docker build -t oci-openai-proxy .
docker run --rm -p 8000:8000 --env-file .env \
  -v ~/.oci:/root/.oci:ro oci-openai-proxy

Testes rápidos

Execute qualquer um dos blocos abaixo em outro terminal enquanto o proxy estiver ativo.

python -u - <<'PY'
import json
import requests
import sys

url = "http://localhost:8000/v1/chat/completions"
payload = {
    "model": "xai.grok-4-fast-reasoning",
    "stream": True,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Describe how to benchmark a FastAPI proxy."},
    ],
}

resp = requests.post(url, json=payload, stream=True)
for line in resp.iter_lines(decode_unicode=True):
    if not line or not line.startswith("data:"):
        continue
    event = json.loads(line[5:])
    for choice in event.get("choices", []):
        text = choice.get("delta", {}).get("content")
        if text:
            sys.stdout.write(text)
            sys.stdout.flush()
PY
python -u - <<'PY'
import json, requests, sys

url = "http://localhost:8000/v1/responses"
payload = {
    "model": "xai.grok-4-fast-reasoning",
    "input": "Give me two tips for scaling FastAPI proxies.",
    "stream": True,
}

resp = requests.post(url, json=payload, stream=True)
for line in resp.iter_lines(decode_unicode=True):
    if not line or not line.startswith("data:"):
        continue
    event = json.loads(line[5:])
    if event.get("type") == "response.output_text.delta":
        sys.stdout.write(event.get("delta", ""))
        sys.stdout.flush()
PY
python - <<'PY'
from pprint import pprint
import requests

url = "http://localhost:8000/v1/chat/completions"
payload = {
    "model": "xai.grok-4-fast-reasoning",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Summarize the benefits of async proxies."},
    ],
}

resp = requests.post(url, json=payload)
resp.raise_for_status()
print(resp.json()["choices"][0]["message"]["content"])
PY
python - <<'PY'
import requests

url = "http://localhost:8000/v1/responses"
payload = {
    "model": "xai.grok-4-fast-reasoning",
    "input": "List three best practices for running FastAPI proxies at scale."
}

resp = requests.post(url, json=payload)
resp.raise_for_status()
print(resp.json()["output"][0]["content"][0]["text"])
PY
curl -s http://localhost:8000/health | jq

🇺🇸 English

Overview

FastAPI proxy that converts OpenAI-compatible requests (/v1/chat/completions, /v1/responses) into calls to OCI's Generative AI inference endpoint. It captures simple metrics, supports SSE streaming, and mirrors OpenAI’s response shape.

Requirements

  • Python 3.14 with uv (uv tool install uv)
  • OCI CLI credentials configured under ~/.oci/config with key_file=~/.oci/oci_api_key.pem
  • Environment file .env containing:
    OCI_LOCAL_OCI_DIR=~/.oci
    OCI_CONFIG_FILE=~/.oci/config
    OCI_CONFIG_PROFILE=DEFAULT
    OCI_COMPARTMENT_ID=<your_compartment_ocid>
    OCI_ENDPOINT=https://inference.generativeai.us-chicago-1.oci.oraclecloud.com
    OCI_MAX_CONNECTIONS=1000
    OCI_MAX_KEEPALIVE=200
    OCI_REQUEST_TIMEOUT=60.0
    

Running with uv

uv run uvicorn proxy:app --reload

The service listens on http://127.0.0.1:8000. Update .env to switch tenants, endpoints, or tuning knobs.

Running with Docker / Docker Compose

docker compose build
docker compose up

Compose loads .env and mounts ~/.oci into /root/.oci so the container can reuse the same config and key files. To run using plain Docker:

docker build -t oci-openai-proxy .
docker run --rm -p 8000:8000 --env-file .env \
  -v ~/.oci:/root/.oci:ro oci-openai-proxy

Quick tests

Execute the following snippets while the proxy is running:

python -u - <<'PY'
import json
import requests
import sys

url = "http://localhost:8000/v1/chat/completions"
payload = {
    "model": "xai.grok-4-fast-reasoning",
    "stream": True,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Describe how to benchmark a FastAPI proxy."},
    ],
}

resp = requests.post(url, json=payload, stream=True)
for line in resp.iter_lines(decode_unicode=True):
    if not line or not line.startswith("data:"):
        continue
    event = json.loads(line[5:])
    for choice in event.get("choices", []):
        text = choice.get("delta", {}).get("content")
        if text:
            sys.stdout.write(text)
            sys.stdout.flush()
PY
python -u - <<'PY'
import json, requests, sys

url = "http://localhost:8000/v1/responses"
payload = {
    "model": "xai.grok-4-fast-reasoning",
    "input": "Give me two tips for scaling FastAPI proxies.",
    "stream": True,
}

resp = requests.post(url, json=payload, stream=True)
for line in resp.iter_lines(decode_unicode=True):
    if not line or not line.startswith("data:"):
        continue
    event = json.loads(line[5:])
    if event.get("type") == "response.output_text.delta":
        sys.stdout.write(event.get("delta", ""))
        sys.stdout.flush()
PY
python - <<'PY'
from pprint import pprint
import requests

url = "http://localhost:8000/v1/chat/completions"
payload = {
    "model": "xai.grok-4-fast-reasoning",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Summarize the benefits of async proxies."},
    ],
}

resp = requests.post(url, json=payload)
resp.raise_for_status()
print(resp.json()["choices"][0]["message"]["content"])
PY
python - <<'PY'
import requests

url = "http://localhost:8000/v1/responses"
payload = {
    "model": "xai.grok-4-fast-reasoning",
    "input": "List three best practices for running FastAPI proxies at scale."
}

resp = requests.post(url, json=payload)
resp.raise_for_status()
print(resp.json()["output"][0]["content"][0]["text"])
PY
curl -s http://localhost:8000/health | jq

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published