-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Expected Behavior
The pipeline should be able to retry normally.
Actual Behavior
When retrying, the pipeline is unable to create the Pod for the retry.
failed to create task run pod "test-sidecar-workspace-run": Pod "test-sidecar-workspace-run-pod-retry1" is invalid: spec.containers[0].volumeMounts[0].name: Not found: "ws-jchh2". Maybe missing or invalid Task default/test-sidecar-workspace
Steps to Reproduce the Problem
- Create the following Task and TaskRun.
cat <<'EOF' | kubectl replace --force -f -
apiVersion: tekton.dev/v1
kind: Task
metadata:
name: test-sidecar-workspace
spec:
workspaces:
- description: cache
name: cache
steps:
- name: command
image: alpine
volumeMounts:
- mountPath: /cache
name: $(workspaces.cache.volume) # <-- It's this configuration that caused the above problem.
script: |
#!/bin/sh
set -ex
echo "hello world"
exit 1
---
apiVersion: tekton.dev/v1
kind: TaskRun
metadata:
name: test-sidecar-workspace-run
spec:
taskRef:
name: test-sidecar-workspace
retries: 1
workspaces:
- name: cache
emptyDir: {}
EOF- Wait for the TaskRun to be processed.
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
generation: 1
labels:
app.kubernetes.io/managed-by: tekton-pipelines
tekton.dev/task: test-sidecar-workspace
name: test-sidecar-workspace-run
spec:
retries: 1
serviceAccountName: default
taskRef:
kind: Task
name: test-sidecar-workspace
timeout: 1h0m0s
workspaces:
- emptyDir: {}
name: cache
status:
completionTime: "2024-04-16T09:58:14Z"
conditions:
- lastTransitionTime: "2024-04-16T09:58:14Z"
message:
'failed to create task run pod "test-sidecar-workspace-run": Pod "test-sidecar-workspace-run-pod-retry1"
is invalid: spec.containers[0].volumeMounts[0].name: Not found: "ws-jchh2".
Maybe missing or invalid Task default/test-sidecar-workspace'
reason: PodCreationFailed
status: "False"
type: Succeeded
podName: ""
provenance:
featureFlags:
AwaitSidecarReadiness: true
Coschedule: workspaces
DisableAffinityAssistant: false
DisableCredsInit: false
EnableAPIFields: beta
EnableArtifacts: false
EnableCELInWhenExpression: false
EnableKeepPodOnCancel: false
EnableParamEnum: false
EnableProvenanceInStatus: true
EnableStepActions: false
EnableTektonOCIBundles: false
EnforceNonfalsifiability: none
MaxResultSize: 4096
RequireGitSSHSecretKnownHosts: false
ResultExtractionMethod: termination-message
RunningInEnvWithInjectedSidecars: true
ScopeWhenExpressionsToTask: false
SendCloudEventsForRuns: false
SetSecurityContext: false
VerificationNoMatchPolicy: ignore
retriesStatus:
- completionTime: "2024-04-16T09:58:14Z"
conditions:
- lastTransitionTime: "2024-04-16T09:58:14Z"
message: '"step-command" exited with code 1'
reason: Failed
status: "False"
type: Succeeded
podName: test-sidecar-workspace-run-pod
provenance:
featureFlags:
AwaitSidecarReadiness: true
Coschedule: workspaces
DisableAffinityAssistant: false
DisableCredsInit: false
EnableAPIFields: beta
EnableArtifacts: false
EnableCELInWhenExpression: false
EnableKeepPodOnCancel: false
EnableParamEnum: false
EnableProvenanceInStatus: true
EnableStepActions: false
EnableTektonOCIBundles: false
EnforceNonfalsifiability: none
MaxResultSize: 4096
RequireGitSSHSecretKnownHosts: false
ResultExtractionMethod: termination-message
RunningInEnvWithInjectedSidecars: true
ScopeWhenExpressionsToTask: false
SendCloudEventsForRuns: false
SetSecurityContext: false
VerificationNoMatchPolicy: ignore
startTime: "2024-04-16T09:58:06Z"
steps:
- container: step-command
imageID: docker.io/library/alpine@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b
name: command
terminated:
containerID: containerd://1beacfc9852b3a0c80eb30494bf80691d67533136e35f599d6bb62fe416bdfd1
exitCode: 1
finishedAt: "2024-04-16T09:58:13Z"
reason: Error
startedAt: "2024-04-16T09:58:13Z"
taskSpec:
steps:
- image: alpine
name: command
resources: {}
script: |
#!/bin/sh
set -ex
echo "hello world"
exit 1
volumeMounts:
- mountPath: /cache
name: ws-jchh2
workspaces:
- description: cache
name: cache
startTime: "2024-04-16T09:58:14Z"
steps:
- container: step-command
imageID: docker.io/library/alpine@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b
name: command
terminated:
containerID: containerd://1beacfc9852b3a0c80eb30494bf80691d67533136e35f599d6bb62fe416bdfd1
exitCode: 1
finishedAt: "2024-04-16T09:58:13Z"
reason: Error
startedAt: "2024-04-16T09:58:13Z"
taskSpec:
steps:
- image: alpine
name: command
resources: {}
script: |
#!/bin/sh
set -ex
echo "hello world"
exit 1
volumeMounts:
- mountPath: /cache
name: ws-jchh2
workspaces:
- description: cache
name: cache
Additional Info
-
Kubernetes version:
Output of
kubectl version:
Client Version: v1.29.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.8
-
Tekton Pipeline version:
Output of
tkn versionorkubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'
Client version: 0.36.0
Pipeline version: v0.58.0
Analysis
0. Summary
- During the first reconciliation, the variable
workspace.<name>.volumein the taskSpec was replaced with a random value. - The modified taskSpec was then stored in the
taskRun.status.taskSpec. - When retrying, the replaced value of taskSpec from the status was used directly.
- However, in
CreateVolumes, the workspace’s name was generated again. - The newly generated volume name this time was inconsistent with the first one, which caused the above issue.
1. reconcile -> CreateVolumes && applyParamsContextsResultsAndWorkspaces && createPod
pipeline/pkg/reconciler/taskrun/taskrun.go
Lines 617 to 635 in f0a1d64
| // Get the randomized volume names assigned to workspace bindings | |
| workspaceVolumes := workspace.CreateVolumes(tr.Spec.Workspaces) | |
| ts, err := applyParamsContextsResultsAndWorkspaces(ctx, tr, rtr, workspaceVolumes) | |
| if err != nil { | |
| logger.Errorf("Error updating task spec parameters, contexts, results and workspaces: %s", err) | |
| return err | |
| } | |
| tr.Status.TaskSpec = ts | |
| if len(tr.Status.TaskSpec.Steps) > 0 { | |
| logger.Debugf("set taskspec for %s/%s - script: %s", tr.Namespace, tr.Name, tr.Status.TaskSpec.Steps[0].Script) | |
| } | |
| if pod == nil { | |
| pod, err = c.createPod(ctx, ts, tr, rtr, workspaceVolumes) | |
| if err != nil { | |
| newErr := c.handlePodCreationError(tr, err) | |
| logger.Errorf("Failed to create task run pod for taskrun %q: %v", tr.Name, newErr) |
2.1 CreateVolumes -> RestrictLengthWithRandomSuffix
pipeline/pkg/workspace/apply.go
Lines 45 to 82 in f0a1d64
| // CreateVolumes will return a dictionary where the keys are the names of the workspaces bound in | |
| // wb and the value is a newly-created Volume to use. If the same Volume is bound twice, the | |
| // resulting volumes will both have the same name to prevent the same Volume from being attached | |
| // to a pod twice. The names of the returned volumes will be a short random string starting "ws-". | |
| func CreateVolumes(wb []v1.WorkspaceBinding) map[string]corev1.Volume { | |
| pvcs := map[string]corev1.Volume{} | |
| v := make(nameVolumeMap) | |
| for _, w := range wb { | |
| name := names.SimpleNameGenerator.RestrictLengthWithRandomSuffix(volumeNameBase) | |
| switch { | |
| case w.PersistentVolumeClaim != nil: | |
| // If it's a PVC, we need to check if we've encountered it before so we avoid mounting it twice | |
| if vv, ok := pvcs[w.PersistentVolumeClaim.ClaimName]; ok { | |
| v[w.Name] = vv | |
| } else { | |
| pvc := *w.PersistentVolumeClaim | |
| v.setVolumeSource(w.Name, name, corev1.VolumeSource{PersistentVolumeClaim: &pvc}) | |
| pvcs[pvc.ClaimName] = v[w.Name] | |
| } | |
| case w.EmptyDir != nil: | |
| ed := *w.EmptyDir | |
| v.setVolumeSource(w.Name, name, corev1.VolumeSource{EmptyDir: &ed}) | |
| case w.ConfigMap != nil: | |
| cm := *w.ConfigMap | |
| v.setVolumeSource(w.Name, name, corev1.VolumeSource{ConfigMap: &cm}) | |
| case w.Secret != nil: | |
| s := *w.Secret | |
| v.setVolumeSource(w.Name, name, corev1.VolumeSource{Secret: &s}) | |
| case w.Projected != nil: | |
| s := *w.Projected | |
| v.setVolumeSource(w.Name, name, corev1.VolumeSource{Projected: &s}) | |
| case w.CSI != nil: | |
| csi := *w.CSI | |
| v.setVolumeSource(w.Name, name, corev1.VolumeSource{CSI: &csi}) | |
| } | |
| } | |
| return v | |
| } |
2.2 RestrictLengthWithRandomSuffix -> random string
pipeline/pkg/names/generate.go
Lines 54 to 61 in f0a1d64
| // RestrictLengthWithRandomSuffix takes a base name and returns a potentially shortened version of that name with | |
| // a random suffix, with the whole string no longer than 63 characters. | |
| func (simpleNameGenerator) RestrictLengthWithRandomSuffix(base string) string { | |
| if len(base) > maxGeneratedNameLength { | |
| base = base[:maxGeneratedNameLength] | |
| } | |
| return fmt.Sprintf("%s-%s", base, utilrand.String(randomLength)) | |
| } |
3.1 applyParamsContextsResultsAndWorkspaces -> ApplyWorkspaces
pipeline/pkg/reconciler/taskrun/taskrun.go
Lines 905 to 946 in f0a1d64
| // applyParamsContextsResultsAndWorkspaces applies paramater, context, results and workspace substitutions to the TaskSpec. | |
| func applyParamsContextsResultsAndWorkspaces(ctx context.Context, tr *v1.TaskRun, rtr *resources.ResolvedTask, workspaceVolumes map[string]corev1.Volume) (*v1.TaskSpec, error) { | |
| ts := rtr.TaskSpec.DeepCopy() | |
| var defaults []v1.ParamSpec | |
| if len(ts.Params) > 0 { | |
| defaults = append(defaults, ts.Params...) | |
| } | |
| // Apply parameter substitution from the taskrun. | |
| ts = resources.ApplyParameters(ts, tr, defaults...) | |
| // Apply context substitution from the taskrun | |
| ts = resources.ApplyContexts(ts, rtr.TaskName, tr) | |
| // Apply task result substitution | |
| ts = resources.ApplyResults(ts) | |
| // Apply step exitCode path substitution | |
| ts = resources.ApplyStepExitCodePath(ts) | |
| // Apply workspace resource substitution | |
| // propagate workspaces from taskrun to task. | |
| twn := []string{} | |
| for _, tw := range ts.Workspaces { | |
| twn = append(twn, tw.Name) | |
| } | |
| for _, trw := range tr.Spec.Workspaces { | |
| skip := false | |
| for _, tw := range twn { | |
| if tw == trw.Name { | |
| skip = true | |
| break | |
| } | |
| } | |
| if !skip { | |
| ts.Workspaces = append(ts.Workspaces, v1.WorkspaceDeclaration{Name: trw.Name}) | |
| } | |
| } | |
| ts = resources.ApplyWorkspaces(ctx, ts, ts.Workspaces, tr.Spec.Workspaces, workspaceVolumes) | |
| return ts, nil | |
| } |
3.2 ApplyWorkspaces -> ApplyReplacements
pipeline/pkg/reconciler/taskrun/resources/apply.go
Lines 277 to 310 in f0a1d64
| // ApplyWorkspaces applies the substitution from paths that the workspaces in declarations mounted to, the | |
| // volumes that bindings are realized with in the task spec and the PersistentVolumeClaim names for the | |
| // workspaces. | |
| func ApplyWorkspaces(ctx context.Context, spec *v1.TaskSpec, declarations []v1.WorkspaceDeclaration, bindings []v1.WorkspaceBinding, vols map[string]corev1.Volume) *v1.TaskSpec { | |
| stringReplacements := map[string]string{} | |
| bindNames := sets.NewString() | |
| for _, binding := range bindings { | |
| bindNames.Insert(binding.Name) | |
| } | |
| for _, declaration := range declarations { | |
| prefix := fmt.Sprintf("workspaces.%s.", declaration.Name) | |
| if declaration.Optional && !bindNames.Has(declaration.Name) { | |
| stringReplacements[prefix+"bound"] = "false" | |
| stringReplacements[prefix+"path"] = "" | |
| } else { | |
| stringReplacements[prefix+"bound"] = "true" | |
| spec = applyWorkspaceMountPath(prefix+"path", spec, declaration) | |
| } | |
| } | |
| for name, vol := range vols { | |
| stringReplacements[fmt.Sprintf("workspaces.%s.volume", name)] = vol.Name | |
| } | |
| for _, binding := range bindings { | |
| if binding.PersistentVolumeClaim != nil { | |
| stringReplacements[fmt.Sprintf("workspaces.%s.claim", binding.Name)] = binding.PersistentVolumeClaim.ClaimName | |
| } else { | |
| stringReplacements[fmt.Sprintf("workspaces.%s.claim", binding.Name)] = "" | |
| } | |
| } | |
| return ApplyReplacements(spec, stringReplacements, map[string][]string{}, map[string]map[string]string{}) | |
| } |
3.3 workspaces.%s.volume
pipeline/pkg/reconciler/taskrun/resources/apply.go
Lines 299 to 301 in f0a1d64
| for name, vol := range vols { | |
| stringReplacements[fmt.Sprintf("workspaces.%s.volume", name)] = vol.Name | |
| } |
4. Set replaced taskSpec to TaskRun Status
pipeline/pkg/reconciler/taskrun/taskrun.go
Line 625 in f0a1d64
| tr.Status.TaskSpec = ts |
5. reconcile -> prepare -> GetTaskFuncFromTaskRun
pipeline/pkg/reconciler/taskrun/resources/taskref.go
Lines 55 to 79 in f0a1d64
| // GetTaskFuncFromTaskRun is a factory function that will use the given TaskRef as context to return a valid GetTask function. | |
| // It also requires a kubeclient, tektonclient, namespace, and service account in case it needs to find that task in | |
| // cluster or authorize against an external repositroy. It will figure out whether it needs to look in the cluster or in | |
| // a remote image to fetch the reference. It will also return the "kind" of the task being referenced. | |
| // OCI bundle and remote resolution tasks will be verified by trusted resources if the feature is enabled | |
| func GetTaskFuncFromTaskRun(ctx context.Context, k8s kubernetes.Interface, tekton clientset.Interface, requester remoteresource.Requester, taskrun *v1.TaskRun, verificationPolicies []*v1alpha1.VerificationPolicy) GetTask { | |
| // if the spec is already in the status, do not try to fetch it again, just use it as source of truth. | |
| // Same for the RefSource field in the Status.Provenance. | |
| if taskrun.Status.TaskSpec != nil { | |
| return func(_ context.Context, name string) (*v1.Task, *v1.RefSource, *trustedresources.VerificationResult, error) { | |
| var refSource *v1.RefSource | |
| if taskrun.Status.Provenance != nil { | |
| refSource = taskrun.Status.Provenance.RefSource | |
| } | |
| return &v1.Task{ | |
| ObjectMeta: metav1.ObjectMeta{ | |
| Name: name, | |
| Namespace: taskrun.Namespace, | |
| }, | |
| Spec: *taskrun.Status.TaskSpec, | |
| }, refSource, nil, nil | |
| } | |
| } | |
| return GetTaskFunc(ctx, k8s, tekton, requester, taskrun, taskrun.Spec.TaskRef, taskrun.Name, taskrun.Namespace, taskrun.Spec.ServiceAccountName, verificationPolicies) | |
| } |
if the spec is already in the status, do not try to fetch it again, just use it as source of truth.
Solution
1. The content stored in taskRun.status.taskSpec is the original content of taskSpec, rather than the values after variable substitution.
Advantages: It's very versatile and there is no need to worry about retries being affected even if similar random variables are introduced in the future.
Disadvantages: The information in taskus.taskSpec is no longer intuitive. It's impossible to directly see the actual variable values during execution.
2. Each time the taskSpec definition is acquired, it is fetched from the remote end, rather than reusing the one in the status.
Disadvantages: It needs to be fetched every time reconciliation occurs, which is not very practical.
3. Change the method of generating volume names
No longer be completely random, but rather to hash the original name. Make sure that the name generated each time is the same and is not duplicated within the current workspaces.
In the current taskSpec, only the value of workspace.<name>.volume may be different each time; the values of other variables are fixed and will not change anymore.
I personally prefer to fix it using this solution.