Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions site-src/_includes/epp-latest.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

```bash
export GATEWAY_PROVIDER=gke
helm install ${MODEL_SERVER}-qwen3-32b \
helm install ${INFERENCE_POOL_NAME} \
--dependency-update \
--set inferencePool.modelServers.matchLabels.app=${MODEL_SERVER}-qwen3-32b \
--set inferencePool.modelServers.matchLabels.app=${INFERENCE_POOL_NAME} \
--set provider.name=$GATEWAY_PROVIDER \
--set inferencePool.modelServerType=${MODEL_SERVER} \
--set experimentalHttpRoute.enabled=true \
Expand All @@ -16,23 +16,23 @@

```bash
export GATEWAY_PROVIDER=istio
helm install ${MODEL_SERVER}-qwen3-32b \
helm install ${INFERENCE_POOL_NAME} \
--dependency-update \
--set inferencePool.modelServers.matchLabels.app=${MODEL_SERVER}-qwen3-32b \
--set inferencePool.modelServers.matchLabels.app=${INFERENCE_POOL_NAME} \
--set provider.name=$GATEWAY_PROVIDER \
--set inferencePool.modelServerType=${MODEL_SERVER} \
--set experimentalHttpRoute.enabled=true \
--version $IGW_CHART_VERSION \
oci://us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/charts/inferencepool
```

=== "Kgateway"
=== "Agentgateway"

```bash
export GATEWAY_PROVIDER=none
helm install ${MODEL_SERVER}-qwen3-32b \
helm install ${INFERENCE_POOL_NAME} \
--dependency-update \
--set inferencePool.modelServers.matchLabels.app=${MODEL_SERVER}-qwen3-32b \
--set inferencePool.modelServers.matchLabels.app=${INFERENCE_POOL_NAME} \
--set provider.name=$GATEWAY_PROVIDER \
--set inferencePool.modelServerType=${MODEL_SERVER} \
--set experimentalHttpRoute.enabled=true \
Expand All @@ -44,12 +44,12 @@

```bash
export GATEWAY_PROVIDER=none
helm install ${MODEL_SERVER}-qwen3-32b \
helm install ${INFERENCE_POOL_NAME} \
--dependency-update \
--set inferencePool.modelServers.matchLabels.app=${MODEL_SERVER}-qwen3-32b \
--set inferencePool.modelServers.matchLabels.app=${INFERENCE_POOL_NAME} \
--set provider.name=$GATEWAY_PROVIDER \
--set inferencePool.modelServerType=${MODEL_SERVER} \
--set experimentalHttpRoute.enabled=true \
--version $IGW_CHART_VERSION \
oci://us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/charts/inferencepool
```
```
20 changes: 10 additions & 10 deletions site-src/_includes/epp.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

```bash
export GATEWAY_PROVIDER=gke
helm install vllm-qwen3-32b \
helm install ${INFERENCE_POOL_NAME} \
--dependency-update \
--set inferencePool.modelServers.matchLabels.app=vllm-qwen3-32b \
--set inferencePool.modelServers.matchLabels.app=${INFERENCE_POOL_NAME} \
--set provider.name=$GATEWAY_PROVIDER \
--set experimentalHttpRoute.enabled=true \
--version $IGW_CHART_VERSION \
Expand All @@ -15,22 +15,22 @@

```bash
export GATEWAY_PROVIDER=istio
helm install vllm-qwen3-32b \
helm install ${INFERENCE_POOL_NAME} \
--dependency-update \
--set inferencePool.modelServers.matchLabels.app=vllm-qwen3-32b \
--set inferencePool.modelServers.matchLabels.app=${INFERENCE_POOL_NAME} \
--set provider.name=$GATEWAY_PROVIDER \
--set experimentalHttpRoute.enabled=true \
--version $IGW_CHART_VERSION \
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
```

=== "Kgateway"
=== "Agentgateway"

```bash
export GATEWAY_PROVIDER=none
helm install vllm-qwen3-32b \
helm install ${INFERENCE_POOL_NAME} \
--dependency-update \
--set inferencePool.modelServers.matchLabels.app=vllm-qwen3-32b \
--set inferencePool.modelServers.matchLabels.app=${INFERENCE_POOL_NAME} \
--set provider.name=$GATEWAY_PROVIDER \
--set experimentalHttpRoute.enabled=true \
--version $IGW_CHART_VERSION \
Expand All @@ -41,11 +41,11 @@

```bash
export GATEWAY_PROVIDER=none
helm install vllm-qwen3-32b \
helm install ${INFERENCE_POOL_NAME} \
--dependency-update \
--set inferencePool.modelServers.matchLabels.app=vllm-qwen3-32b \
--set inferencePool.modelServers.matchLabels.app=${INFERENCE_POOL_NAME} \
--set provider.name=$GATEWAY_PROVIDER \
--set experimentalHttpRoute.enabled=true \
--version $IGW_CHART_VERSION \
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
```
```
2 changes: 1 addition & 1 deletion site-src/_includes/test.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
PORT=80

curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
"model": "Qwen/Qwen3-32B",
"model": "${MODEL_NAME}",
"prompt": "Write as if you were a critic: San Francisco",
"max_tokens": 100,
"temperature": 0
Expand Down
4 changes: 2 additions & 2 deletions site-src/_includes/verify-status-latest.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@


```bash
kubectl get httproute ${MODEL_SERVER}-qwen3-32b -o yaml
kubectl get httproute ${INFERENCE_POOL_NAME} -o yaml
```

The `HttpRoute` status should include `Accepted=True` and `ResolvedRefs=True`.
Expand All @@ -15,7 +15,7 @@


```bash
kubectl get inferencepool ${MODEL_SERVER}-qwen3-32b -o yaml
kubectl get inferencepool ${INFERENCE_POOL_NAME} -o yaml
```

The `InferencePool` status should include `Accepted=True` and `ResolvedRefs=True`.
4 changes: 2 additions & 2 deletions site-src/_includes/verify-status.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@


```bash
kubectl get httproute vllm-qwen3-32b -o yaml
kubectl get httproute ${INFERENCE_POOL_NAME} -o yaml
```

The `HttpRoute` status should include `Accepted=True` and `ResolvedRefs=True`.
Expand All @@ -15,7 +15,7 @@


```bash
kubectl get inferencepool vllm-qwen3-32b -o yaml
kubectl get inferencepool ${INFERENCE_POOL_NAME} -o yaml
```

The `InferencePool` status should include `Accepted=True` and `ResolvedRefs=True`.
52 changes: 29 additions & 23 deletions site-src/guides/getting-started-latest.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,24 +20,32 @@

```bash
MODEL_SERVER=vllm # sglang is also supported.
INFERENCE_POOL_NAME=${MODEL_SERVER}-qwen3-32b
MODEL_NAME=Qwen/Qwen3-32B
```

--8<-- "site-src/_includes/model-server-gpu.md"

```bash
export INFERENCE_POOL_NAME=${MODEL_SERVER}-qwen3-32b
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we repeat the vars from L23-24 on every tab?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the shared INFERENCE_POOL_NAME and MODEL_NAME definitions from the preamble. Keeping those values owned by each tab makes the selected flow self-contained, and it matches the structure in index.md where GPU, CPU, and simulator each define their own values.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nirrozenbaum thanks for the review! The current latest guide duplicates those vars. IMHO quickstart tabs should be self-contained for copy/paste. I'll remove the common definitions rather than make the tabs depend on hidden state.

export MODEL_NAME=Qwen/Qwen3-32B
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to the set of Llama models
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/${MODEL_SERVER}/gpu-deployment.yaml
```

--8<-- "site-src/_includes/model-server-cpu.md"

```bash
export INFERENCE_POOL_NAME=vllm-qwen3-32b
export MODEL_NAME=Qwen/Qwen3-32B
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/cpu-deployment.yaml
```

--8<-- "site-src/_includes/model-server-sim.md"

```bash
export INFERENCE_POOL_NAME=vllm-llama3-8b-instruct
export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/sim-deployment.yaml
```

Expand Down Expand Up @@ -81,23 +89,23 @@ kubectl apply -k https://github.com/kubernetes-sigs/gateway-api-inference-extens
>
> Istio v1.28.0 includes full support for InferencePool v1. This guide assumes you are using Istio v1.28.0 or later to ensure compatibility with the InferencePool API.

=== "Kgateway"
=== "Agentgateway"

1. Requirements

- Gateway API [CRDs](https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api) installed.

1. Set the Kgateway version and install the Kgateway CRDs:
1. Set the Agentgateway version and install the Agentgateway CRDs:

```bash
KGTW_VERSION=v2.1.0
helm upgrade -i --create-namespace --namespace kgateway-system --version $KGTW_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
AGW_VERSION=v1.0.0-alpha.4
helm upgrade -i --create-namespace --namespace agentgateway-system --version $AGW_VERSION agentgateway-crds oci://cr.agentgateway.dev/charts/agentgateway-crds
```

1. Install Kgateway:
1. Install Agentgateway:

```bash
helm upgrade -i --namespace kgateway-system --version $KGTW_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true
helm upgrade -i --namespace agentgateway-system --version $AGW_VERSION agentgateway oci://cr.agentgateway.dev/charts/agentgateway --set inferenceExtension.enabled=true
```

=== "NGINX Gateway Fabric"
Expand Down Expand Up @@ -157,17 +165,15 @@ kubectl apply -k https://github.com/kubernetes-sigs/gateway-api-inference-extens
inference-gateway inference-gateway <MY_ADDRESS> True 22s
```

=== "Kgateway"
=== "Agentgateway"

[Kgateway](https://kgateway.dev/) is a Gateway API and Inference Gateway
[conformant](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/conformance/reports/v1.0.0/gateway/kgateway)
implementation. Kgateway supports Inference Gateway with the [agentgateway](https://agentgateway.dev/) data plane. Follow these steps
to run Kgateway as an Inference Gateway:
[Agentgateway](https://agentgateway.dev/) is a Gateway API and Inference Gateway implementation. Follow these steps
to run Agentgateway as an Inference Gateway:

1. Deploy the Inference Gateway:

```bash
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/gateway.yaml
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/agentgateway/gateway.yaml
```

1. Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
Expand Down Expand Up @@ -236,7 +242,7 @@ If you wish to exercise that function, then retain the setup you have deployed s
1. Uninstall the InferencePool, InferenceObjective and model server resources:

```bash
helm uninstall ${MODEL_SERVER}-vllm-qwen3-32b --ignore-not-found
helm uninstall ${INFERENCE_POOL_NAME} --ignore-not-found
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferenceobjective.yaml --ignore-not-found
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/cpu-deployment.yaml --ignore-not-found
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/${MODEL_SERVER}/gpu-deployment.yaml --ignore-not-found
Expand Down Expand Up @@ -278,30 +284,30 @@ If you wish to exercise that function, then retain the setup you have deployed s
kubectl delete ns istio-system
```

=== "Kgateway"
=== "Agentgateway"

```bash
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/gateway.yaml --ignore-not-found
kubectl delete gateway inference-gateway --ignore-not-found
```

The following steps assume you would like to cleanup ALL Kgateway resources that were created in this quickstart guide.
The following steps assume you would like to cleanup ALL Agentgateway resources that were created in this quickstart guide.

1. Uninstall Kgateway:
1. Uninstall Agentgateway:

```bash
helm uninstall kgateway -n kgateway-system
helm uninstall agentgateway -n agentgateway-system
```

1. Uninstall the Kgateway CRDs:
1. Uninstall the Agentgateway CRDs:

```bash
helm uninstall kgateway-crds -n kgateway-system
helm uninstall agentgateway-crds -n agentgateway-system
```

1. Remove the Kgateway namespace:
1. Remove the Agentgateway namespace:

```bash
kubectl delete ns kgateway-system
kubectl delete ns agentgateway-system
```

=== "NGINX Gateway Fabric"
Expand All @@ -325,4 +331,4 @@ If you wish to exercise that function, then retain the setup you have deployed s

```bash
kubectl delete ns nginx-gateway
```
```
Loading