Extracted from ContribAI project — working implementation for Google Vertex AI (Gemini models).
Vertex AI uses gcloud CLI for authentication instead of API keys:
# Login once
gcloud auth login
gcloud auth application-default login
# Set active project
gcloud config set project YOUR_PROJECT_ID
# Test token retrieval
gcloud auth print-access-tokenNo API key needed — authentication is handled by gcloud auth print-access-token.
llm:
provider: "gemini" # or "vertex"
model: "gemini-3-flash-preview"
vertex_project: "YOUR_GCP_PROJECT_ID"
vertex_location: "global" # or specific region like "us-central1"
api_key: "" # Leave empty for Vertex AIEnvironment variables:
export GOOGLE_CLOUD_PROJECT=YOUR_PROJECT_IDhttps://generativelanguage.googleapis.com/v1beta/models/{MODEL}:generateContent?key={API_KEY}
https://{HOSTNAME}/{API_VERSION}/projects/{PROJECT}/locations/{LOCATION}/publishers/google/models/{MODEL}:generateContent
Where:
HOSTNAME=aiplatform.googleapis.com(if location is "global")HOSTNAME={LOCATION}-aiplatform.googleapis.com(for regional endpoints)API_VERSION=v1beta1(for preview models) orv1(for stable models)
Examples:
# Global endpoint
https://aiplatform.googleapis.com/v1beta1/projects/my-project/locations/global/publishers/google/models/gemini-3-flash-preview:generateContent
# Regional endpoint (us-central1)
https://us-central1-aiplatform.googleapis.com/v1/v1/projects/my-project/locations/us-central1/publishers/google/models/gemini-2.0-flash:generateContent
{
"contents": [{
"role": "user",
"parts": [{ "text": "Your prompt here" }]
}],
"generationConfig": {
"temperature": 0.3,
"maxOutputTokens": 65536
},
"systemInstruction": {
"parts": [{ "text": "Optional system prompt" }]
}
}Windows:
let out = std::process::Command::new("cmd")
.args(["/c", "gcloud", "auth", "print-access-token"])
.output()?;Linux/Mac:
let out = std::process::Command::new("gcloud")
.args(["auth", "print-access-token"])
.output()?;Token handling:
# Get token
TOKEN=$(gcloud auth print-access-token)
# Use in request
curl -X POST "https://aiplatform.googleapis.com/v1beta1/projects/YOUR_PROJECT/locations/global/publishers/google/models/gemini-3-flash-preview:generateContent" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"role": "user",
"parts": [{"text": "Hello"}]
}],
"generationConfig": {
"temperature": 0.3,
"maxOutputTokens": 1024
}
}'# Variables
PROJECT="YOUR_GCP_PROJECT_ID"
LOCATION="global"
MODEL="gemini-3-flash-preview"
API_VERSION="v1beta1" # Use "v1" for stable models
# Get token
TOKEN=$(gcloud auth print-access-token)
# Build URL
if [ "$LOCATION" = "global" ]; then
HOSTNAME="aiplatform.googleapis.com"
else
HOSTNAME="${LOCATION}-aiplatform.googleapis.com"
fi
URL="https://${HOSTNAME}/${API_VERSION}/projects/${PROJECT}/locations/${LOCATION}/publishers/google/models/${MODEL}:generateContent"
# Make request
curl -X POST "$URL" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"role": "user",
"parts": [{"text": "Explain quantum computing in simple terms"}]
}],
"generationConfig": {
"temperature": 0.3,
"maxOutputTokens": 2048
}
}'import subprocess
import requests
import json
def get_vertex_token():
"""Fetch access token from gcloud CLI."""
result = subprocess.run(
["gcloud", "auth", "print-access-token"],
capture_output=True, text=True, check=True
)
return result.stdout.strip()
def vertex_ai_complete(prompt, system=None, model="gemini-3-flash-preview",
project="YOUR_PROJECT", location="global",
temperature=0.3, max_tokens=65536):
"""Call Vertex AI Gemini model."""
# Get token
token = get_vertex_token()
# Build endpoint
api_version = "v1beta1" if "preview" in model else "v1"
if location == "global":
hostname = "aiplatform.googleapis.com"
else:
hostname = f"{location}-aiplatform.googleapis.com"
url = f"https://{hostname}/{api_version}/projects/{project}/locations/{location}/publishers/google/models/{model}:generateContent"
# Build request body
body = {
"contents": [{
"role": "user",
"parts": [{"text": prompt}]
}],
"generationConfig": {
"temperature": temperature,
"maxOutputTokens": max_tokens
}
}
if system:
body["systemInstruction"] = {
"parts": [{"text": system}]
}
# Make request
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
response = requests.post(url, json=body, headers=headers)
response.raise_for_status()
# Extract text
data = response.json()
return data["candidates"][0]["content"]["parts"][0]["text"]
# Usage
result = vertex_ai_complete(
prompt="What is Rust?",
system="You are a helpful coding assistant.",
project="my-gcp-project"
)
print(result)Solution: Install Google Cloud SDK:
# Windows
# Download from https://cloud.google.com/sdk/docs/install
# Linux
curl https://sdk.cloud.google.com | bashSolution: Login and set project:
gcloud auth login
gcloud auth application-default login
gcloud config set project YOUR_PROJECTCauses:
- Token expired (tokens valid for ~1 hour)
- Wrong project ID
- Vertex AI API not enabled in GCP console
Solution:
# Re-authenticate
gcloud auth login
# Enable Vertex AI API
gcloud services enable aiplatform.googleapis.comSolution: Grant Vertex AI User role:
gcloud projects add-iam-policy-binding YOUR_PROJECT \
--member="user:YOUR_EMAIL" \
--role="roles/aiplatform.user"Available models:
gemini-3-flash-preview(v1beta1)gemini-3-pro-preview(v1beta1)gemini-2.0-flash(v1)gemini-1.5-pro(v1)
Vertex AI supports content caching for repeated context:
# Create cached content
curl -X POST "https://aiplatform.googleapis.com/v1beta1/projects/$PROJECT/locations/$LOCATION/cachedContents" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "projects/$PROJECT/locations/$LOCATION/publishers/google/models/$MODEL",
"contents": [{
"role": "user",
"parts": [{"text": "Your large context text here"}]
}],
"ttl": "3600s"
}'
# Response includes "name": "cachedContents/xyz123"
# Use this in subsequent requests:
# "cachedContent": "cachedContents/xyz123"| Aspect | API Key | Vertex AI |
|---|---|---|
| Auth | ?key=API_KEY |
Authorization: Bearer TOKEN |
| Base URL | generativelanguage.googleapis.com |
{location}-aiplatform.googleapis.com |
| Model Format | models/gemini-3-flash-preview |
projects/.../publishers/google/models/... |
| Token Source | Static key | gcloud auth print-access-token |
| Rate Limits | Lower | Higher (enterprise) |
| Caching | Limited | Full context caching |
- gcloud CLI installed
-
gcloud auth logincompleted -
gcloud config set project YOUR_PROJECTdone - Vertex AI API enabled in GCP Console
- User has
roles/aiplatform.userpermission -
vertex_projectset in config (not empty) -
api_keyleft empty (or ignored when vertex_project is set) - Model name correct (check for "preview" suffix)
- API version correct (v1beta1 for preview, v1 for stable)
Extracted from: crates/contribai-rs/src/llm/provider.rs
Working implementation: Lines 70-300, 1285-1305
Config: crates/contribai-rs/src/core/config.rs Lines 240-269