|
| 1 | +# Net Topology Aware Plugin |
| 2 | + |
| 3 | +## Backgrounds |
| 4 | + |
| 5 | +Usually, a kubernetes cluster has many nodes, and those nodes are in different idc, chassis, and even different switches. |
| 6 | +Data transformations across different idc, chassis and switches has different performance. Some latency sensitive workloads are need to run in same idc, even in same topology devices, such as chassis and switch. |
| 7 | + |
| 8 | +## Motivation |
| 9 | + |
| 10 | +We target to make scheduler Net topology aware so as to achieve the following: |
| 11 | + |
| 12 | +- best effort to schedule same job to same topology devices. |
| 13 | + |
| 14 | +## Goals |
| 15 | + |
| 16 | +- Support single key topology configuration, try to schedule job's all tasks to nodes which have same value with that key |
| 17 | +- Support multiple-key topology policies, the more front get higher score |
| 18 | + |
| 19 | +## Non-Goals |
| 20 | + |
| 21 | +- Not to find the global solutions among nodes with all values of that key |
| 22 | + |
| 23 | +## Design Action |
| 24 | + |
| 25 | +### Pod scheduling process |
| 26 | + |
| 27 | +1. when the first task of a job is allocated to a node, record the node information in the plugin |
| 28 | +2. when scheduling other tasks, a node with same key-value as the target node, get a higher score. Otherwise, get a zero score. |
| 29 | +3. If a node has multiple keys same as the configured list, the first key matching the configured keys has higher score |
| 30 | + |
| 31 | +```go |
| 32 | +nodeOrderFn := func(task *api.TaskInfo, node *api.NodeInfo) (float64, error){ |
| 33 | + ... |
| 34 | + score := 0 |
| 35 | + weight := np.weight |
| 36 | + tlabels := tNode.Node.Labels |
| 37 | + labels := node.Node.Labels |
| 38 | + lenth := len(np.topologyKeys) |
| 39 | + for i, key := range np.topologyKeys { |
| 40 | + if tlabels[key] == labels[key] { |
| 41 | + score += (lenth - i) // key with more priority at front of which with less priority |
| 42 | + break |
| 43 | + } |
| 44 | + } |
| 45 | + return float64(score * weight), nil |
| 46 | +} |
| 47 | +``` |
| 48 | + |
| 49 | +### Usage |
| 50 | + |
| 51 | +```yaml |
| 52 | +- plugins: |
| 53 | + - name: net-topology |
| 54 | + arguments: |
| 55 | + net-topology.keys: switch,idc |
| 56 | + net-topology.weight: 10 |
| 57 | +``` |
| 58 | +
|
| 59 | +## Drawbacks |
| 60 | +
|
| 61 | +It is not a global solution which put a job's all tasks in same topology nodes. For example, nodes list with key-value1 has not enough resource, but nodes list with key-value2 does, if first task was bind to nodes with key-value1, then other tasks will all try that nodes list. |
0 commit comments