Skip to content

Commit 2674b08

Browse files
committed
add docs about net-topology plugin
Signed-off-by: lowang-bh <[email protected]>
1 parent 8d62d1c commit 2674b08

File tree

1 file changed

+82
-0
lines changed

1 file changed

+82
-0
lines changed

docs/design/net-topology-aware.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Net Topology Aware Plugin
2+
3+
- [Net Topology Aware Plugin](#net-topology-aware-plugin)
4+
- [Backgrounds](#backgrounds)
5+
- [Motivation](#motivation)
6+
- [Proposal one](#proposal-one)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Design Action](#design-action)
10+
- [Pod scheduling process](#pod-scheduling-process)
11+
- [Usage](#usage)
12+
- [Drawbacks](#drawbacks)
13+
14+
## Backgrounds
15+
16+
Usually, a kubernetes cluster has many nodes, and those nodes are in different idc, chassis, and even different switches.
17+
Data transformations across different idc, chassis and switches has different performance. Some latency sensitive workloads are need to run in same idc, even in same topology devices, such as chassis and switch.
18+
19+
## Motivation
20+
21+
We target to make scheduler net-topology aware so as to achieve the following:
22+
23+
- best effort to schedule same job to same topology devices, such as same idc, chassis or switch.
24+
25+
## Proposal one
26+
27+
This proposal need cluster administrator to manage the network topology labels on k8s nodes. They can label those nodes
28+
29+
within same topologies a same label value.
30+
31+
### Goals
32+
33+
- Support single-key topology configuration, try to schedule job's all tasks to nodes which have same value with that key
34+
- Support multiple-key topology policies, the key at front get higher score
35+
36+
### Non-Goals
37+
38+
- Not to find the global solutions among nodes with all values of that key
39+
40+
### Design Action
41+
42+
#### Pod scheduling process
43+
44+
1. when the first task of a job is allocated to a node, record the node's topology information in the plugin
45+
2. when scheduling other tasks, a node with same topology as the allocated tasks, get a higher score. Otherwise, get a zero score.
46+
3. If a node has multiple keys same as what's in the configured list, the first key matching the configured keys has higher score
47+
48+
```go
49+
nodeOrderFn := func(task *api.TaskInfo, node *api.NodeInfo) (float64, error){
50+
...
51+
score := 0
52+
weight := np.weight
53+
tlabels := tNode.Node.Labels
54+
labels := node.Node.Labels
55+
lenth := len(np.topologyKeys)
56+
for i, key := range np.topologyKeys {
57+
if tlabels[key] == labels[key] {
58+
score += (lenth - i) // key with more priority at front of which with less priority
59+
break
60+
}
61+
}
62+
return float64(score * weight), nil
63+
}
64+
```
65+
66+
#### Usage
67+
68+
1. label nodes with some key-value pairs, for example switch=NvLink-A100, rack=rack1,rack2, idc=bj,sh to partition nodes to different topology zones.
69+
2. add net-topology plugin in scheduler-configuration
70+
71+
```yaml
72+
- plugins:
73+
- name: net-topology
74+
arguments:
75+
net-topology.type: static
76+
net-topology.keys: rack,switch,idc
77+
net-topology.weight: 10
78+
```
79+
80+
### Drawbacks
81+
82+
It is not a global solution which put a job's all tasks in same topology nodes. For example, nodes list with key-value1 has not enough resource, but nodes list with key-value2 does, if first task was bind to nodes with key-value1, then other tasks will all try that nodes list.

0 commit comments

Comments
 (0)