Skip to content

Commit 0d5107b

Browse files
committed
Add unit tests for request body
1 parent 1ba13f3 commit 0d5107b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+2162
-501
lines changed
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
name: Blank Issue
3+
about: Create a new issue from scratch
4+
title: ''
5+
labels: needs-triage
6+
assignees: ''
7+
8+
---

.github/ISSUE_TEMPLATE/bug_request.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
---
22
name: Bug Report
33
about: Report a bug you encountered
4-
labels: kind/bug
4+
title: ''
5+
labels: kind/bug, needs-triage
6+
assignees: ''
57

68
---
79

.github/ISSUE_TEMPLATE/config.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
blank_issues_enabled: false

.github/ISSUE_TEMPLATE/feature_request.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
name: Feature request
33
about: Suggest an idea for this project
44
title: ''
5-
labels: ''
5+
labels: needs-triage
66
assignees: ''
77

88
---
@@ -12,4 +12,3 @@ assignees: ''
1212
**What would you like to be added**:
1313

1414
**Why is this needed**:
15-

.github/ISSUE_TEMPLATE/new-release.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ about: Propose a new release
44
title: Release v0.x.0
55
labels: ''
66
assignees: ''
7+
78
---
89

910
- [Introduction](#introduction)

Makefile

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,8 +123,12 @@ vet: ## Run go vet against code.
123123
test: manifests generate fmt vet envtest image-build ## Run tests.
124124
KUBEBUILDER_ASSETS="$(shell $(ENVTEST) use $(ENVTEST_K8S_VERSION) --bin-dir $(LOCALBIN) -p path)" go test $$(go list ./... | grep -v /e2e) -race -coverprofile cover.out
125125

126+
.PHONY: test-unit
127+
test-unit: ## Run unit tests.
128+
KUBEBUILDER_ASSETS="$(shell $(ENVTEST) use $(ENVTEST_K8S_VERSION) --bin-dir $(LOCALBIN) -p path)" go test ./pkg/... -race -coverprofile cover.out
129+
126130
.PHONY: test-integration
127-
test-integration: ## Run tests.
131+
test-integration: ## Run integration tests.
128132
KUBEBUILDER_ASSETS="$(shell $(ENVTEST) use $(ENVTEST_K8S_VERSION) --bin-dir $(LOCALBIN) -p path)" go test ./test/integration/epp/... -race -coverprofile cover.out
129133

130134
.PHONY: test-e2e

README.md

Lines changed: 54 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,57 @@
1-
# Gateway API Inference Extension
1+
[![Go Report Card](https://goreportcard.com/badge/sigs.k8s.io/gateway-api-inference-extension)](https://goreportcard.com/report/sigs.k8s.io/gateway-api-inference-extension)
2+
[![Go Reference](https://pkg.go.dev/badge/sigs.k8s.io/gateway-api-inference-extension.svg)](https://pkg.go.dev/sigs.k8s.io/gateway-api-inference-extension)
3+
[![License](https://img.shields.io/github/license/kubernetes-sigs/gateway-api-inference-extension)](/LICENSE)
4+
5+
# Gateway API Inference Extension (GIE)
6+
7+
This project offers tools for AI Inference, enabling developers to build [Inference Gateways].
8+
9+
[Inference Gateways]:#concepts-and-definitions
10+
11+
## Concepts and Definitions
12+
13+
The following are some key industry terms that are important to understand for
14+
this project:
15+
16+
- **Model**: A generative AI model that has learned patterns from data and is
17+
used for inference. Models vary in size and architecture, from smaller
18+
domain-specific models to massive multi-billion parameter neural networks that
19+
are optimized for diverse language tasks.
20+
- **Inference**: The process of running a generative AI model, such as a large
21+
language model, diffusion model etc, to generate text, embeddings, or other
22+
outputs from input data.
23+
- **Model server**: A service (in our case, containerized) responsible for
24+
receiving inference requests and returning predictions from a model.
25+
- **Accelerator**: specialized hardware, such as Graphics Processing Units
26+
(GPUs) that can be attached to Kubernetes nodes to speed up computations,
27+
particularly for training and inference tasks.
28+
29+
And the following are more specific terms to this project:
30+
31+
- **Scheduler**: Makes decisions about which endpoint is optimal (best cost /
32+
best performance) for an inference request based on `Metrics and Capabilities`
33+
from [Model Serving](/docs/proposals/003-model-server-protocol/README.md).
34+
- **Metrics and Capabilities**: Data provided by model serving platforms about
35+
performance, availability and capabilities to optimize routing. Includes
36+
things like [Prefix Cache] status or [LoRA Adapters] availability.
37+
- **Endpoint Selector**: A `Scheduler` combined with `Metrics and Capabilities`
38+
systems is often referred to together as an [Endpoint Selection Extension]
39+
(this is also sometimes referred to as an "endpoint picker", or "EPP").
40+
- **Inference Gateway**: A proxy/load-balancer which has been coupled with a
41+
`Endpoint Selector`. It provides optimized routing and load balancing for
42+
serving Kubernetes self-hosted generative Artificial Intelligence (AI)
43+
workloads. It simplifies the deployment, management, and observability of AI
44+
inference workloads.
45+
46+
For deeper insights and more advanced concepts, refer to our [proposals](/docs/proposals).
47+
48+
[Inference]:https://www.digitalocean.com/community/tutorials/llm-inference-optimization
49+
[Gateway API]:https://github.com/kubernetes-sigs/gateway-api
50+
[Prefix Cache]:https://docs.vllm.ai/en/stable/design/v1/prefix_caching.html
51+
[LoRA Adapters]:https://docs.vllm.ai/en/stable/features/lora.html
52+
[Endpoint Selection Extension]:https://gateway-api-inference-extension.sigs.k8s.io/#endpoint-selection-extension
53+
54+
## Technical Overview
255

356
This extension upgrades an [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter)-capable proxy or gateway - such as Envoy Gateway, kGateway, or the GKE Gateway - to become an **inference gateway** - supporting inference platform teams self-hosting large language models on Kubernetes. This integration makes it easy to expose and control access to your local [OpenAI-compatible chat completion endpoints](https://platform.openai.com/docs/api-reference/chat) to other workloads on or off cluster, or to integrate your self-hosted models alongside model-as-a-service providers in a higher level **AI Gateway** like LiteLLM, Solo AI Gateway, or Apigee.
457

api/v1alpha2/inferencemodel_types.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ type PoolObjectReference struct {
126126
}
127127

128128
// Criticality defines how important it is to serve the model compared to other models.
129-
// Criticality is intentionally a bounded enum to contain the possibilities that need to be supported by the load balancing algorithm. Any reference to the Criticality field must be optional(use a pointer), and set no default.
129+
// Criticality is intentionally a bounded enum to contain the possibilities that need to be supported by the load balancing algorithm. Any reference to the Criticality field must be optional (use a pointer), and set no default.
130130
// This allows us to union this with a oneOf field in the future should we wish to adjust/extend this behavior.
131131
// +kubebuilder:validation:Enum=Critical;Standard;Sheddable
132132
type Criticality string

cmd/epp/main.go

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ import (
3030
"go.uber.org/zap/zapcore"
3131
"google.golang.org/grpc"
3232
healthPb "google.golang.org/grpc/health/grpc_health_v1"
33+
"k8s.io/apimachinery/pkg/types"
3334
"k8s.io/client-go/rest"
3435
"k8s.io/component-base/metrics/legacyregistry"
3536
ctrl "sigs.k8s.io/controller-runtime"
@@ -140,14 +141,16 @@ func run() error {
140141
return err
141142
}
142143

143-
mgr, err := runserver.NewDefaultManager(*poolNamespace, *poolName, cfg)
144+
poolNamespacedName := types.NamespacedName{
145+
Name: *poolName,
146+
Namespace: *poolNamespace,
147+
}
148+
mgr, err := runserver.NewDefaultManager(poolNamespacedName, cfg)
144149
if err != nil {
145150
setupLog.Error(err, "Failed to create controller manager")
146151
return err
147152
}
148153

149-
ctx := ctrl.SetupSignalHandler()
150-
151154
// Set up mapper for metric scraping.
152155
mapping, err := backendmetrics.NewMetricMapping(
153156
*totalQueuedRequestsMetric,
@@ -162,14 +165,15 @@ func run() error {
162165

163166
pmf := backendmetrics.NewPodMetricsFactory(&backendmetrics.PodMetricsClientImpl{MetricMapping: mapping}, *refreshMetricsInterval)
164167
// Setup runner.
168+
ctx := ctrl.SetupSignalHandler()
169+
165170
datastore := datastore.NewDatastore(ctx, pmf)
166171

167172
serverRunner := &runserver.ExtProcServerRunner{
168173
GrpcPort: *grpcPort,
169174
DestinationEndpointHintMetadataNamespace: *destinationEndpointHintMetadataNamespace,
170175
DestinationEndpointHintKey: *destinationEndpointHintKey,
171-
PoolName: *poolName,
172-
PoolNamespace: *poolNamespace,
176+
PoolNamespacedName: poolNamespacedName,
173177
Datastore: datastore,
174178
SecureServing: *secureServing,
175179
CertPath: *certPath,

config/charts/inferencepool/README.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22

33
A chart to deploy an InferencePool and a corresponding EndpointPicker (epp) deployment.
44

5-
65
## Install
76

87
To install an InferencePool named `vllm-llama3-8b-instruct` that selects from endpoints with label `app: vllm-llama3-8b-instruct` and listening on port `8000`, you can run the following command:
@@ -23,6 +22,18 @@ $ helm install vllm-llama3-8b-instruct \
2322

2423
Note that the provider name is needed to deploy provider-specific resources. If no provider is specified, then only the InferencePool object and the EPP are deployed.
2524

25+
### Install for Triton TensorRT-LLM
26+
27+
Use `--set inferencePool.modelServerType=triton-tensorrt-llm` to install for Triton TensorRT-LLM, e.g.,
28+
29+
```txt
30+
$ helm install triton-llama3-8b-instruct \
31+
--set inferencePool.modelServers.matchLabels.app=triton-llama3-8b-instruct \
32+
--set inferencePool.modelServerType=triton-tensorrt-llm \
33+
--set provider.name=[none|gke] \
34+
oci://us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/charts/inferencepool --version v0
35+
```
36+
2637
## Uninstall
2738

2839
Run the following command to uninstall the chart:
@@ -38,6 +49,7 @@ The following table list the configurable parameters of the chart.
3849
| **Parameter Name** | **Description** |
3950
|---------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
4051
| `inferencePool.targetPortNumber` | Target port number for the vllm backends, will be used to scrape metrics by the inference extension. Defaults to 8000. |
52+
| `inferencePool.modelServerType` | Type of the model servers in the pool, valid options are [vllm, triton-tensorrt-llm], default is vllm. |
4153
| `inferencePool.modelServers.matchLabels` | Label selector to match vllm backends managed by the inference pool. |
4254
| `inferenceExtension.replicas` | Number of replicas for the endpoint picker extension service. Defaults to `1`. |
4355
| `inferenceExtension.image.name` | Name of the container image used for the endpoint picker. |

0 commit comments

Comments
 (0)