Skip to content
This repository was archived by the owner on Aug 7, 2025. It is now read-only.

Commit 60c1b15

Browse files
authored
Merge pull request #1240 from jagadeeshi2i/autoscaler
feat: Add kubernetes HPA for torchserve
2 parents 464276b + c7a92df commit 60c1b15

File tree

5 files changed

+392
-2
lines changed

5 files changed

+392
-2
lines changed

kubernetes/EKS/config.properties

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
inference_address=http://0.0.0.0:8080
22
management_address=http://0.0.0.0:8081
3+
metrics_address=http://0.0.0.0:8082
34
NUM_WORKERS=1
45
number_of_gpu=1
56
number_of_netty_threads=32

kubernetes/README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -279,10 +279,12 @@ Follow the link for log aggregation with EFK Stack.\
279279
* Helm is picking up other .yaml files. Make sure you’ve added other files correctly to .helmignore. It should only run with values.yaml.
280280
* `kubectl describe pod` shows error message "0/1 nodes are available: 1 Insufficient cpu."
281281
* Ensure that the `n_cpu` value in `values.yaml` is set to a number that can be supported by the nodes in the cluster.
282-
282+
283+
## Autoscaling
284+
[Autoscaling with torchserve metrics](autoscale.md)
285+
283286
## Roadmap
284287

285-
* [] Autoscaling
286288
* [] Log / Metrics Aggregation using [AWS Container Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights.html)
287289
* [] EFK Stack Integration
288290
* [] Readiness / Liveness Probes

kubernetes/adapter.yaml

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
# Default values for k8s-prometheus-adapter..
2+
affinity: {}
3+
4+
image:
5+
repository: k8s.gcr.io/prometheus-adapter/prometheus-adapter
6+
tag: v0.9.0
7+
pullPolicy: IfNotPresent
8+
9+
logLevel: 4
10+
11+
metricsRelistInterval: 1m
12+
13+
listenPort: 6443
14+
15+
nodeSelector: {}
16+
17+
priorityClassName: ""
18+
19+
# Url to access prometheus
20+
prometheus:
21+
# Value is templated
22+
url: http://prometheus-server.default.svc.cluster.local
23+
port: 80
24+
path: ""
25+
26+
replicas: 1
27+
28+
# k8s 1.21 needs fsGroup to be set for non root deployments
29+
# ref: https://github.com/kubernetes/kubernetes/issues/70679
30+
podSecurityContext:
31+
fsGroup: 10001
32+
33+
rbac:
34+
# Specifies whether RBAC resources should be created
35+
create: true
36+
37+
psp:
38+
# Specifies whether PSP resources should be created
39+
create: false
40+
41+
serviceAccount:
42+
# Specifies whether a service account should be created
43+
create: true
44+
# The name of the service account to use.
45+
# If not set and create is true, a name is generated using the fullname template
46+
name:
47+
# ServiceAccount annotations.
48+
# Use case: AWS EKS IAM roles for service accounts
49+
# ref: https://docs.aws.amazon.com/eks/latest/userguide/specify-service-account-role.html
50+
annotations: {}
51+
52+
# Custom DNS configuration to be added to prometheus-adapter pods
53+
dnsConfig: {}
54+
# nameservers:
55+
# - 1.2.3.4
56+
# searches:
57+
# - ns1.svc.cluster-domain.example
58+
# - my.dns.search.suffix
59+
# options:
60+
# - name: ndots
61+
# value: "2"
62+
# - name: edns0
63+
resources: {}
64+
# requests:
65+
# cpu: 100m
66+
# memory: 128Mi
67+
# limits:
68+
# cpu: 100m
69+
# memory: 128Mi
70+
71+
rules:
72+
default: true
73+
custom: []
74+
# - seriesQuery: '{__name__=~"^some_metric_count$"}'
75+
# resources:
76+
# template: <<.Resource>>
77+
# name:
78+
# matches: ""
79+
# as: "my_custom_metric"
80+
# metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)
81+
# Mounts a configMap with pre-generated rules for use. Overrides the
82+
# default, custom, external and resource entries
83+
existing:
84+
external:
85+
- seriesQuery: '{__name__=~"^ts_queue_latency_microseconds"}'
86+
resources:
87+
overrides:
88+
namespace:
89+
resource: namespace
90+
service:
91+
resource: service
92+
pod:
93+
resource: pod
94+
name:
95+
matches: "^(.*)_microseconds"
96+
as: "ts_queue_latency_microseconds"
97+
metricsQuery: ts_queue_latency_microseconds
98+
resource: {}
99+
# cpu:
100+
# containerQuery: sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>, container!=""}[3m])) by (<<.GroupBy>>)
101+
# nodeQuery: sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>, id='/'}[3m])) by (<<.GroupBy>>)
102+
# resources:
103+
# overrides:
104+
# node:
105+
# resource: node
106+
# namespace:
107+
# resource: namespace
108+
# pod:
109+
# resource: pod
110+
# containerLabel: container
111+
# memory:
112+
# containerQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>, container!=""}) by (<<.GroupBy>>)
113+
# nodeQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>,id='/'}) by (<<.GroupBy>>)
114+
# resources:
115+
# overrides:
116+
# node:
117+
# resource: node
118+
# namespace:
119+
# resource: namespace
120+
# pod:
121+
# resource: pod
122+
# containerLabel: container
123+
# window: 3m
124+
125+
service:
126+
annotations: {}
127+
port: 443
128+
type: ClusterIP
129+
# clusterIP: 1.2.3.4
130+
131+
tls:
132+
enable: false
133+
ca: |-
134+
# Public CA file that signed the APIService
135+
key: |-
136+
# Private key of the APIService
137+
certificate: |-
138+
# Public key of the APIService
139+
140+
# Any extra arguments
141+
extraArguments: []
142+
# - --tls-private-key-file=/etc/tls/tls.key
143+
# - --tls-cert-file=/etc/tls/tls.crt
144+
145+
# Any extra volumes
146+
extraVolumes: []
147+
# - name: example-name
148+
# hostPath:
149+
# path: /path/on/host
150+
# type: DirectoryOrCreate
151+
# - name: ssl-certs
152+
# hostPath:
153+
# path: /etc/ssl/certs/ca-bundle.crt
154+
# type: File
155+
156+
# Any extra volume mounts
157+
extraVolumeMounts: []
158+
# - name: example-name
159+
# mountPath: /path/in/container
160+
# - name: ssl-certs
161+
# mountPath: /etc/ssl/certs/ca-certificates.crt
162+
# readOnly: true
163+
164+
tolerations: []
165+
166+
# Labels added to the pod
167+
podLabels: {}
168+
169+
# Annotations added to the pod
170+
podAnnotations: {}
171+
172+
hostNetwork:
173+
# Specifies if prometheus-adapter should be started in hostNetwork mode.
174+
#
175+
# You would require this enabled if you use alternate overlay networking for pods and
176+
# API server unable to communicate with metrics-server. As an example, this is required
177+
# if you use Weave network on EKS. See also dnsPolicy
178+
enabled: false
179+
180+
# When hostNetwork is enabled, you probably want to set this to ClusterFirstWithHostNet
181+
# dnsPolicy: ClusterFirstWithHostNet
182+
183+
# Deployment strategy type
184+
strategy:
185+
type: RollingUpdate
186+
rollingUpdate:
187+
maxUnavailable: 25%
188+
maxSurge: 25%
189+
190+
podDisruptionBudget:
191+
# Specifies if PodDisruptionBudget should be enabled
192+
# When enabled, minAvailable or maxUnavailable should also be defined.
193+
enabled: false
194+
minAvailable:
195+
maxUnavailable: 1
196+
197+
certManager:
198+
enabled: false
199+
caCertDuration: 43800h
200+
certDuration: 8760h

kubernetes/autoscale.md

Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
# Autoscaler
2+
3+
Setup Kubernetes HPA(Horizontal Pod Autoscaler) for Torchserve, tuned for torchserve metrics. This uses Prometheus as metrics collector and Prometheus Adapter as mertrics server, serving Torchserve metrics for HPA.
4+
5+
## Steps
6+
7+
### 1. Install Torchserve with metrics enabled for prometheus format
8+
9+
[Install TorchServe using Helm Charts](README.md##-Deploy-TorchServe-using-Helm-Charts)
10+
### 2. Install Prometheus
11+
12+
```bash
13+
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
14+
helm repo update
15+
helm install prometheus prometheus-community/prometheus
16+
```
17+
18+
The above command outputs prometheus server url:
19+
20+
```bash
21+
NAME: prometheus
22+
LAST DEPLOYED: Wed Sep 8 19:10:49 2021
23+
NAMESPACE: default
24+
STATUS: deployed
25+
REVISION: 1
26+
TEST SUITE: None
27+
NOTES:
28+
The Prometheus server can be accessed via port 80 on the following DNS name from within your cluster:
29+
prometheus-server.default.svc.cluster.local
30+
...
31+
...
32+
```
33+
34+
### 3. Install Prometheus Adapater
35+
36+
- Update Prometheus url and port in adapter.yaml. Use the url given in prometheus installation output.
37+
38+
```yaml
39+
# Url to access prometheus
40+
prometheus:
41+
# Value is templated
42+
url: http://prometheus-server.default.svc.cluster.local
43+
port: 80
44+
path: ""
45+
```
46+
47+
- Update external metrics rules in adapter.yaml. Here we enabling external metrics in prometheus adapter and serving `ts_queue_latency_microseconds` metric.
48+
49+
```yaml
50+
external:
51+
- seriesQuery: '{__name__=~"^ts_queue_latency_microseconds"}'
52+
resources:
53+
overrides:
54+
namespace:
55+
resource: namespace
56+
service:
57+
resource: service
58+
pod:
59+
resource: pod
60+
name:
61+
matches: "^(.*)_microseconds"
62+
as: "ts_queue_latency_microseconds"
63+
metricsQuery: ts_queue_latency_microseconds
64+
```
65+
66+
Refer: [Prometheus Adapter External Metrics](https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/docs/externalmetrics.md)
67+
68+
- Install Prometheus adapter
69+
70+
```bash
71+
helm install -f adapter.yaml prometheus-adapter prometheus-community/prometheus-adapter
72+
```
73+
74+
The output of above command is
75+
76+
```
77+
NAME: adapter
78+
LAST DEPLOYED: Wed Sep 8 19:49:28 2021
79+
NAMESPACE: default
80+
STATUS: deployed
81+
REVISION: 1
82+
TEST SUITE: None
83+
NOTES:
84+
adapter-prometheus-adapter has been deployed.
85+
In a few minutes you should be able to list metrics using the following command(s):
86+
87+
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1
88+
89+
kubectl get --raw /apis/external.metrics.k8s.io/v1beta1
90+
```
91+
92+
#### Check External metrics list
93+
94+
```bash
95+
kubectl get --raw /apis/external.metrics.k8s.io/v1beta1 | jq
96+
```
97+
98+
```json
99+
{
100+
"kind": "APIResourceList",
101+
"apiVersion": "v1",
102+
"groupVersion": "external.metrics.k8s.io/v1beta1",
103+
"resources": [
104+
{
105+
"name": "ts_queue_latency_microseconds",
106+
"singularName": "",
107+
"namespaced": true,
108+
"kind": "ExternalMetricValueList",
109+
"verbs": [
110+
"get"
111+
]
112+
}
113+
]
114+
}
115+
```
116+
117+
### 4. Deploy Horizontal Pod Autoscaler for Torchserve
118+
119+
- Change `targetValue` as per requirement.
120+
121+
```yaml
122+
kind: HorizontalPodAutoscaler
123+
apiVersion: autoscaling/v2beta1
124+
metadata:
125+
name: torchserve
126+
spec:
127+
scaleTargetRef:
128+
apiVersion: apps/v1
129+
kind: Deployment
130+
name: torchserve
131+
# autoscale between 1 and 5 replicas
132+
minReplicas: 1
133+
maxReplicas: 5
134+
metrics:
135+
- type: External
136+
external:
137+
metricName: ts_queue_latency_microseconds
138+
targetValue: "7000000m"
139+
```
140+
141+
```bash
142+
kubectl apply -f hpa.yaml
143+
```
144+
145+
### 5. Check status of HPG
146+
147+
```bash
148+
kubectl describe hpa torchserve
149+
```
150+
151+
```bash
152+
Name: torchserve
153+
Namespace: default
154+
Labels: <none>
155+
Annotations: <none>
156+
CreationTimestamp: Wed, 08 Sep 2021 20:09:48 +0530
157+
Reference: Deployment/torchserve
158+
Metrics: ( current / target )
159+
"ts_queue_latency_microseconds" (target value): 5257630m / 7k
160+
Min replicas: 1
161+
Max replicas: 5
162+
Deployment pods: 3 current / 3 desired
163+
Conditions:
164+
Type Status Reason Message
165+
---- ------ ------ -------
166+
AbleToScale True ReadyForNewScale recommended size matches current size
167+
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from external metric ts_queue_latency_microseconds(nil)
168+
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
169+
Events: <none>
170+
```

0 commit comments

Comments
 (0)