You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add Node Affinity for TaskRuns that share PVC workspace
TaskRuns within a PipelineRun may share files using a workspace volume.
The typical case is files from a git-clone operation. Tasks in a CI-pipeline often
perform operations on the filesystem, e.g. generate files or analyze files,
so the workspace abstraction is very useful.
The Kubernetes way of using file volumes is by using [PersistentVolumeClaims](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims).
PersistentVolumeClaims use PersistentVolumes with different [access modes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes).
The most commonly available PV access mode is ReadWriteOnce, volumes with this
access mode can only be mounted on one Node at a time.
When using parallel Tasks in a Pipeline, the pods for the TaskRuns is
scheduled to any Node, most likely not to the same Node in a cluster.
Since volumes with the commonly available ReadWriteOnce access mode cannot
be use by multiple nodes at a time, these "parallel" pods is forced to
execute sequentially, since the volume only is available on one node at a time.
This may make that your TaskRuns time out.
Clusters are often _regional_, e.g. they are deployed across 3 Availability
Zones, but Persistent Volumes are often _zonal_, e.g. they are only available
for the Nodes within a single zone. Some cloud providers offer regional PVs,
but sometimes regional PVs is only replicated to one additional zone, e.g. not
all 3 zones within a region. This works fine for most typical stateful application,
but Tekton uses storage in a different way - it is designed so that multiple pods
access the same volume, in a sequece or parallel.
This makes it difficult to design a Pipeline that starts with parallel tasks using
its own PVC and then have a common tasks that mount the volume from the earlier
tasks - since - what happens if those tasks were scheduled to different zones -
the common task can not mount the PVCs that now is located in different zones, so
the PipelineRun is deadlocked.
There are a few technical solutions that offer parallel executions of Tasks
even when sharing PVC workspace:
- Using PVC access mode ReadWriteMany. But this access mode is not widely available,
and is typically a NFS server or another not so "cloud native" solution.
- An alternative is to use a storage that is tied to a specific node, e.g. local volume
and then configure so pods are scheduled to this node, but this is not commonly
available and it has drawbacks, e.g. the pod may need to consume and mount a whole
disk e.g. several hundreds GB.
Consequently, it would be good to find a way so that TaskRun pods that share
workspace are scheduled to the same Node - and thereby make it easy to use parallel
tasks with workspace - while executing concurrently - on widely available Kubernetes
cluster and storage configurations.
A few alternative solutions have been considered, as documented in #2586.
However, they all have major drawbacks, e.g. major API and contract changes.
This commit introduces an "Affinity Assistant" - a minimal placeholder-pod,
so that it is possible to use [Kubernetes inter-pod affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity) for TaskRun pods that need to be scheduled to the same Node.
This solution has several benefits: it does not introduce any API changes,
it does not break or change any existing Tekton concepts and it is
implemented with very few changes. Additionally it can be disabled with a feature-flag.
**How it works:** When a PipelineRun is initiated, an "Affinity Assistant" is
created for each PVC workspace volume. TaskRun pods that share workspace
volume is configured with podAffinity to the "Affinity Assisant" pod that
was created for the volume. The "Affinity Assistant" lives until the
PipelineRun is completed, or deleted. "Affinity Assistant" pods are
configured with podAntiAffinity to repel other "Affinity Assistants" -
in a Best Effort fashion.
The Affinity Assistant is _singleton_ workload, since it acts as a
placeholder pod and TaskRun pods with affinity must be scheduled to the
same Node. It is implemented with [QoS class Guaranteed](https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed) but with minimal resource requests -
since it does not provide any work other than beeing a placeholder.
Singleton workloads can be implemented in multiple ways, and they differ
in behavior when the Node becomes unreachable:
- as a Pod - the Pod is not managed, so it will not be recreated.
- as a Deployment - the Pod will be recreated and puts Availability before
the singleton property
- as a StatefulSet - the Pod will be recreated but puds the singleton
property before Availability
Therefor the Affinity Assistant is implemented as a StatefulSet.
Essentialy this commit provides an effortless way to use a functional
task parallelism with any Kubernetes cluster that has any PVC based
storage.
Solves #2586
/kind feature
Copy file name to clipboardExpand all lines: docs/install.md
+13Lines changed: 13 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -268,6 +268,19 @@ file lists the keys you can customize along with their default values.
268
268
269
269
To customize the behavior of the Pipelines Controller, modify the ConfigMap `feature-flags` as follows:
270
270
271
+
- `disable-affinity-assistant`- set this flag to disable the [Affinity Assistant](./workspaces.md#affinity-assistant-and-specifying-workspace-order-in-a-pipeline)
272
+
that is used to provide Node Affinity for `TaskRun` pods that share workspace volume.
273
+
The Affinity Assistant pods may be incompatible with NodeSelector and other affinity rules
274
+
configured for `TaskRun` pods.
275
+
276
+
**Note:** Affinity Assistant use [Inter-pod affinity and anti-affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity)
277
+
that require substantial amount of processing which can slow down scheduling in large clusters
278
+
significantly. We do not recommend using them in clusters larger than several hundred nodes
279
+
280
+
**Note:** Pod anti-affinity requires nodes to be consistently labelled, in other words every
281
+
node in the cluster must have an appropriate label matching `topologyKey`. If some or all nodes
282
+
are missing the specified `topologyKey` label, it can lead to unintended behavior.
283
+
271
284
- `disable-home-env-overwrite`- set this flag to `true` to prevent Tekton
272
285
from overriding the `$HOME` environment variable for the containers executing your `Steps`.
273
286
The default is `false`. For more information, see the [associated issue](https://github.com/tektoncd/pipeline/issues/2013).
Copy file name to clipboardExpand all lines: docs/workspaces.md
+22-20Lines changed: 22 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ weight: 5
15
15
-[Mapping `Workspaces` in `Tasks` to `TaskRuns`](#mapping-workspaces-in-tasks-to-taskruns)
16
16
-[Examples of `TaskRun` definition using `Workspaces`](#examples-of-taskrun-definition-using-workspaces)
17
17
-[Using `Workspaces` in `Pipelines`](#using-workspaces-in-pipelines)
18
-
-[Specifying `Workspace` order in a `Pipeline`](#specifying-workspace-order-in-a-pipeline)
18
+
-[Affinity Assistant and specifying `Workspace` order in a `Pipeline`](#affinity-assistant-and-specifying-workspace-order-in-a-pipeline)
19
19
-[Specifying `Workspaces` in `PipelineRuns`](#specifying-workspaces-in-pipelineruns)
20
20
-[Example `PipelineRun` definition using `Workspaces`](#example-pipelinerun-definition-using-workspaces)
21
21
-[Specifying `VolumeSources` in `Workspaces`](#specifying-volumesources-in-workspaces)
@@ -89,7 +89,8 @@ To configure one or more `Workspaces` in a `Task`, add a `workspaces` list with
89
89
90
90
Note the following:
91
91
92
-
- A `Task` definition can include as many `Workspaces` as it needs.
92
+
- A `Task` definition can include as many `Workspaces` as it needs. It is recommended that `Tasks` use
93
+
**at most** one _writable_`Workspace`.
93
94
- A `readOnly``Workspace` will have its volume mounted as read-only. Attempting to write
94
95
to a `readOnly``Workspace` will result in errors and failed `TaskRuns`.
95
96
-`mountPath` can be either absolute or relative. Absolute paths start with `/` and relative paths
@@ -204,26 +205,27 @@ Include a `subPath` in the workspace binding to mount different parts of the sam
204
205
205
206
The `subPath` specified in a `Pipeline` will be appended to any `subPath` specified as part of the `PipelineRun` workspace declaration. So a `PipelineRun` declaring a Workspace with `subPath` of `/foo` for a `Pipeline` who binds it to a `Task` with `subPath` of `/bar` will end up mounting the `Volume`'s `/foo/bar` directory.
206
207
207
-
#### Specifying `Workspace` order in a `Pipeline`
208
+
#### Affinity Assistant and specifying `Workspace` order in a `Pipeline`
208
209
209
210
Sharing a `Workspace` between `Tasks` requires you to define the order in which those `Tasks`
210
-
will be accessing that `Workspace` since different classes of storage have different limits
211
-
for concurrent reads and writes. For example, a `PersistentVolumeClaim` with
**Note:** Affinity Assistant use [Inter-pod affinity and anti-affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity)
223
+
that require substantial amount of processing which can slow down scheduling in large clusters
224
+
significantly. We do not recommend using them in clusters larger than several hundred nodes
225
+
226
+
**Note:** Pod anti-affinity requires nodes to be consistently labelled, in other words every
227
+
node in the cluster must have an appropriate label matching `topologyKey`. If some or all nodes
228
+
are missing the specified `topologyKey` label, it can lead to unintended behavior.
0 commit comments