Proxy FastAPI que traduz chamadas no estilo OpenAI (/v1/chat/completions, /v1/responses) para a API de inferência da OCI. O serviço mantém métricas simples, suporta streaming SSE e replica a aparência das respostas originais da OpenAI.
- Python 3.14 com uv instalado (
uv tool install uv) - Credenciais configuradas no
~/.oci/config(comkey_file=~/.oci/oci_api_key.pem) - Variáveis em
.env(veja.env.exampleou crie o arquivo conforme abaixo):OCI_LOCAL_OCI_DIR=~/.oci OCI_CONFIG_FILE=~/.oci/config OCI_CONFIG_PROFILE=DEFAULT OCI_COMPARTMENT_ID=<seu_compartment_ocid> OCI_ENDPOINT=https://inference.generativeai.us-chicago-1.oci.oraclecloud.com OCI_MAX_CONNECTIONS=1000 OCI_MAX_KEEPALIVE=200 OCI_REQUEST_TIMEOUT=60.0
uv run uvicorn proxy:app --reloadO servidor expõe http://127.0.0.1:8000. Ajuste OCI_* no .env para trocar de tenant, endpoint ou limites.
docker compose build
docker compose upO Compose carrega env_file: .env automaticamente e monta ~/.oci em /root/.oci, permitindo que o proxy leia a mesma configuração da CLI. Para executar apenas com Docker:
docker build -t oci-openai-proxy .
docker run --rm -p 8000:8000 --env-file .env \
-v ~/.oci:/root/.oci:ro oci-openai-proxyExecute qualquer um dos blocos abaixo em outro terminal enquanto o proxy estiver ativo.
python -u - <<'PY'
import json
import requests
import sys
url = "http://localhost:8000/v1/chat/completions"
payload = {
"model": "xai.grok-4-fast-reasoning",
"stream": True,
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Describe how to benchmark a FastAPI proxy."},
],
}
resp = requests.post(url, json=payload, stream=True)
for line in resp.iter_lines(decode_unicode=True):
if not line or not line.startswith("data:"):
continue
event = json.loads(line[5:])
for choice in event.get("choices", []):
text = choice.get("delta", {}).get("content")
if text:
sys.stdout.write(text)
sys.stdout.flush()
PYpython -u - <<'PY'
import json, requests, sys
url = "http://localhost:8000/v1/responses"
payload = {
"model": "xai.grok-4-fast-reasoning",
"input": "Give me two tips for scaling FastAPI proxies.",
"stream": True,
}
resp = requests.post(url, json=payload, stream=True)
for line in resp.iter_lines(decode_unicode=True):
if not line or not line.startswith("data:"):
continue
event = json.loads(line[5:])
if event.get("type") == "response.output_text.delta":
sys.stdout.write(event.get("delta", ""))
sys.stdout.flush()
PYpython - <<'PY'
from pprint import pprint
import requests
url = "http://localhost:8000/v1/chat/completions"
payload = {
"model": "xai.grok-4-fast-reasoning",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize the benefits of async proxies."},
],
}
resp = requests.post(url, json=payload)
resp.raise_for_status()
print(resp.json()["choices"][0]["message"]["content"])
PYpython - <<'PY'
import requests
url = "http://localhost:8000/v1/responses"
payload = {
"model": "xai.grok-4-fast-reasoning",
"input": "List three best practices for running FastAPI proxies at scale."
}
resp = requests.post(url, json=payload)
resp.raise_for_status()
print(resp.json()["output"][0]["content"][0]["text"])
PYcurl -s http://localhost:8000/health | jqFastAPI proxy that converts OpenAI-compatible requests (/v1/chat/completions, /v1/responses) into calls to OCI's Generative AI inference endpoint. It captures simple metrics, supports SSE streaming, and mirrors OpenAI’s response shape.
- Python 3.14 with uv (
uv tool install uv) - OCI CLI credentials configured under
~/.oci/configwithkey_file=~/.oci/oci_api_key.pem - Environment file
.envcontaining:OCI_LOCAL_OCI_DIR=~/.oci OCI_CONFIG_FILE=~/.oci/config OCI_CONFIG_PROFILE=DEFAULT OCI_COMPARTMENT_ID=<your_compartment_ocid> OCI_ENDPOINT=https://inference.generativeai.us-chicago-1.oci.oraclecloud.com OCI_MAX_CONNECTIONS=1000 OCI_MAX_KEEPALIVE=200 OCI_REQUEST_TIMEOUT=60.0
uv run uvicorn proxy:app --reloadThe service listens on http://127.0.0.1:8000. Update .env to switch tenants, endpoints, or tuning knobs.
docker compose build
docker compose upCompose loads .env and mounts ~/.oci into /root/.oci so the container can reuse the same config and key files. To run using plain Docker:
docker build -t oci-openai-proxy .
docker run --rm -p 8000:8000 --env-file .env \
-v ~/.oci:/root/.oci:ro oci-openai-proxyExecute the following snippets while the proxy is running:
python -u - <<'PY'
import json
import requests
import sys
url = "http://localhost:8000/v1/chat/completions"
payload = {
"model": "xai.grok-4-fast-reasoning",
"stream": True,
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Describe how to benchmark a FastAPI proxy."},
],
}
resp = requests.post(url, json=payload, stream=True)
for line in resp.iter_lines(decode_unicode=True):
if not line or not line.startswith("data:"):
continue
event = json.loads(line[5:])
for choice in event.get("choices", []):
text = choice.get("delta", {}).get("content")
if text:
sys.stdout.write(text)
sys.stdout.flush()
PYpython -u - <<'PY'
import json, requests, sys
url = "http://localhost:8000/v1/responses"
payload = {
"model": "xai.grok-4-fast-reasoning",
"input": "Give me two tips for scaling FastAPI proxies.",
"stream": True,
}
resp = requests.post(url, json=payload, stream=True)
for line in resp.iter_lines(decode_unicode=True):
if not line or not line.startswith("data:"):
continue
event = json.loads(line[5:])
if event.get("type") == "response.output_text.delta":
sys.stdout.write(event.get("delta", ""))
sys.stdout.flush()
PYpython - <<'PY'
from pprint import pprint
import requests
url = "http://localhost:8000/v1/chat/completions"
payload = {
"model": "xai.grok-4-fast-reasoning",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize the benefits of async proxies."},
],
}
resp = requests.post(url, json=payload)
resp.raise_for_status()
print(resp.json()["choices"][0]["message"]["content"])
PYpython - <<'PY'
import requests
url = "http://localhost:8000/v1/responses"
payload = {
"model": "xai.grok-4-fast-reasoning",
"input": "List three best practices for running FastAPI proxies at scale."
}
resp = requests.post(url, json=payload)
resp.raise_for_status()
print(resp.json()["output"][0]["content"][0]["text"])
PYcurl -s http://localhost:8000/health | jq