Pod creation issue on Task retry with `workspace.<name>.volume` settings

# Expected Behavior
The pipeline should be able to retry normally.

# Actual Behavior

When retrying, the pipeline is unable to create the Pod for the retry.

```
failed to create task run pod "test-sidecar-workspace-run": Pod "test-sidecar-workspace-run-pod-retry1" is invalid: spec.containers[0].volumeMounts[0].name: Not found: "ws-jchh2". Maybe missing or invalid Task default/test-sidecar-workspace
```

# Steps to Reproduce the Problem

1. Create the following Task and TaskRun.
```yaml
cat <<'EOF' | kubectl replace --force -f -
apiVersion: tekton.dev/v1
kind: Task
metadata:
  name: test-sidecar-workspace
spec:
  workspaces:
  - description: cache
    name: cache
  steps:
  - name: command
    image: alpine
    volumeMounts:
    - mountPath: /cache
      name: $(workspaces.cache.volume)   # <-- It's this configuration that caused the above problem.
    script: |
      #!/bin/sh
      set -ex
      echo "hello world"
      exit 1
---
apiVersion: tekton.dev/v1
kind: TaskRun
metadata:
  name: test-sidecar-workspace-run
spec:
  taskRef:
    name: test-sidecar-workspace
  retries: 1
  workspaces:
  - name: cache
    emptyDir: {}
EOF
```

2. Wait for the TaskRun to be processed.
```yaml
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  generation: 1
  labels:
    app.kubernetes.io/managed-by: tekton-pipelines
    tekton.dev/task: test-sidecar-workspace
  name: test-sidecar-workspace-run
spec:
  retries: 1
  serviceAccountName: default
  taskRef:
    kind: Task
    name: test-sidecar-workspace
  timeout: 1h0m0s
  workspaces:
    - emptyDir: {}
      name: cache
status:
  completionTime: "2024-04-16T09:58:14Z"
  conditions:
    - lastTransitionTime: "2024-04-16T09:58:14Z"
      message:
        'failed to create task run pod "test-sidecar-workspace-run": Pod "test-sidecar-workspace-run-pod-retry1"
        is invalid: spec.containers[0].volumeMounts[0].name: Not found: "ws-jchh2".
        Maybe missing or invalid Task default/test-sidecar-workspace'
      reason: PodCreationFailed
      status: "False"
      type: Succeeded
  podName: ""
  provenance:
    featureFlags:
      AwaitSidecarReadiness: true
      Coschedule: workspaces
      DisableAffinityAssistant: false
      DisableCredsInit: false
      EnableAPIFields: beta
      EnableArtifacts: false
      EnableCELInWhenExpression: false
      EnableKeepPodOnCancel: false
      EnableParamEnum: false
      EnableProvenanceInStatus: true
      EnableStepActions: false
      EnableTektonOCIBundles: false
      EnforceNonfalsifiability: none
      MaxResultSize: 4096
      RequireGitSSHSecretKnownHosts: false
      ResultExtractionMethod: termination-message
      RunningInEnvWithInjectedSidecars: true
      ScopeWhenExpressionsToTask: false
      SendCloudEventsForRuns: false
      SetSecurityContext: false
      VerificationNoMatchPolicy: ignore
  retriesStatus:
    - completionTime: "2024-04-16T09:58:14Z"
      conditions:
        - lastTransitionTime: "2024-04-16T09:58:14Z"
          message: '"step-command" exited with code 1'
          reason: Failed
          status: "False"
          type: Succeeded
      podName: test-sidecar-workspace-run-pod
      provenance:
        featureFlags:
          AwaitSidecarReadiness: true
          Coschedule: workspaces
          DisableAffinityAssistant: false
          DisableCredsInit: false
          EnableAPIFields: beta
          EnableArtifacts: false
          EnableCELInWhenExpression: false
          EnableKeepPodOnCancel: false
          EnableParamEnum: false
          EnableProvenanceInStatus: true
          EnableStepActions: false
          EnableTektonOCIBundles: false
          EnforceNonfalsifiability: none
          MaxResultSize: 4096
          RequireGitSSHSecretKnownHosts: false
          ResultExtractionMethod: termination-message
          RunningInEnvWithInjectedSidecars: true
          ScopeWhenExpressionsToTask: false
          SendCloudEventsForRuns: false
          SetSecurityContext: false
          VerificationNoMatchPolicy: ignore
      startTime: "2024-04-16T09:58:06Z"
      steps:
        - container: step-command
          imageID: docker.io/library/alpine@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b
          name: command
          terminated:
            containerID: containerd://1beacfc9852b3a0c80eb30494bf80691d67533136e35f599d6bb62fe416bdfd1
            exitCode: 1
            finishedAt: "2024-04-16T09:58:13Z"
            reason: Error
            startedAt: "2024-04-16T09:58:13Z"
      taskSpec:
        steps:
          - image: alpine
            name: command
            resources: {}
            script: |
              #!/bin/sh
              set -ex
              echo "hello world"
              exit 1
            volumeMounts:
              - mountPath: /cache
                name: ws-jchh2
        workspaces:
          - description: cache
            name: cache
  startTime: "2024-04-16T09:58:14Z"
  steps:
    - container: step-command
      imageID: docker.io/library/alpine@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b
      name: command
      terminated:
        containerID: containerd://1beacfc9852b3a0c80eb30494bf80691d67533136e35f599d6bb62fe416bdfd1
        exitCode: 1
        finishedAt: "2024-04-16T09:58:13Z"
        reason: Error
        startedAt: "2024-04-16T09:58:13Z"
  taskSpec:
    steps:
      - image: alpine
        name: command
        resources: {}
        script: |
          #!/bin/sh
          set -ex
          echo "hello world"
          exit 1
        volumeMounts:
          - mountPath: /cache
            name: ws-jchh2
    workspaces:
      - description: cache
        name: cache

```

# Additional Info

- Kubernetes version:

  **Output of `kubectl version`:**

```
Client Version: v1.29.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.8
```

- Tekton Pipeline version:

  **Output of `tkn version` or `kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'`**

```
Client version: 0.36.0
Pipeline version: v0.58.0
```


# Analysis

### 0. Summary

1. During the first reconciliation, the variable `workspace.<name>.volume` in the taskSpec was replaced with a random value.
2. The modified taskSpec was then stored in the `taskRun.status.taskSpec`.
3. When retrying, the replaced value of taskSpec from the status was used directly.
4. However, in `CreateVolumes`, the workspace’s name was generated again.
5. The newly generated volume name this time was inconsistent with the first one, which caused the above issue.

### 1. reconcile -> CreateVolumes && applyParamsContextsResultsAndWorkspaces && createPod
https://github.com/tektoncd/pipeline/blob/f0a1d64aa88929f8915f208db65a8f731b5c92e2/pkg/reconciler/taskrun/taskrun.go#L617-L635

### 2.1 CreateVolumes -> RestrictLengthWithRandomSuffix 
https://github.com/tektoncd/pipeline/blob/f0a1d64aa88929f8915f208db65a8f731b5c92e2/pkg/workspace/apply.go#L45-L82

### 2.2 RestrictLengthWithRandomSuffix -> random string
https://github.com/tektoncd/pipeline/blob/f0a1d64aa88929f8915f208db65a8f731b5c92e2/pkg/names/generate.go#L54-L61

### 3.1 applyParamsContextsResultsAndWorkspaces -> ApplyWorkspaces

https://github.com/tektoncd/pipeline/blob/f0a1d64aa88929f8915f208db65a8f731b5c92e2/pkg/reconciler/taskrun/taskrun.go#L905-L946

### 3.2 ApplyWorkspaces -> ApplyReplacements

https://github.com/tektoncd/pipeline/blob/f0a1d64aa88929f8915f208db65a8f731b5c92e2/pkg/reconciler/taskrun/resources/apply.go#L277-L310

### 3.3 `workspaces.%s.volume`
https://github.com/tektoncd/pipeline/blob/f0a1d64aa88929f8915f208db65a8f731b5c92e2/pkg/reconciler/taskrun/resources/apply.go#L299-L301

### 4. Set replaced taskSpec to TaskRun Status
https://github.com/tektoncd/pipeline/blob/f0a1d64aa88929f8915f208db65a8f731b5c92e2/pkg/reconciler/taskrun/taskrun.go#L625

### 5. reconcile -> prepare -> GetTaskFuncFromTaskRun
https://github.com/tektoncd/pipeline/blob/f0a1d64aa88929f8915f208db65a8f731b5c92e2/pkg/reconciler/taskrun/resources/taskref.go#L55-L79

**if the spec is already in the status, do not try to fetch it again, just use it as source of truth.**

# Solution

### 1. The content stored in taskRun.status.taskSpec is the original content of taskSpec, rather than the values after variable substitution.

Advantages: It's very versatile and there is no need to worry about retries being affected even if similar random variables are introduced in the future.

Disadvantages: The information in `taskus.taskSpec` is no longer intuitive. It's impossible to directly see the actual variable values during execution.

### 2. Each time the taskSpec definition is acquired, it is fetched from the remote end, rather than reusing the one in the status.

Disadvantages: It needs to be fetched every time reconciliation occurs, which is not very practical.

### 3. Change the method of generating volume names
No longer be completely random, but rather to hash the original name. Make sure that the name generated each time is the same and is not duplicated within the current workspaces.

In the current taskSpec, only the value of `workspace.<name>.volume` may be different each time; the values of other variables are fixed and will not change anymore.

**I personally prefer to fix it using this solution.**

### 4. Remove support for `workspace.<name>.volume`

	// Get the randomized volume names assigned to workspace bindings
	workspaceVolumes := workspace.CreateVolumes(tr.Spec.Workspaces)

	ts, err := applyParamsContextsResultsAndWorkspaces(ctx, tr, rtr, workspaceVolumes)
	if err != nil {
	logger.Errorf("Error updating task spec parameters, contexts, results and workspaces: %s", err)
	return err
	}
	tr.Status.TaskSpec = ts

	if len(tr.Status.TaskSpec.Steps) > 0 {
	logger.Debugf("set taskspec for %s/%s - script: %s", tr.Namespace, tr.Name, tr.Status.TaskSpec.Steps[0].Script)
	}

	if pod == nil {
	pod, err = c.createPod(ctx, ts, tr, rtr, workspaceVolumes)
	if err != nil {
	newErr := c.handlePodCreationError(tr, err)
	logger.Errorf("Failed to create task run pod for taskrun %q: %v", tr.Name, newErr)

	// CreateVolumes will return a dictionary where the keys are the names of the workspaces bound in
	// wb and the value is a newly-created Volume to use. If the same Volume is bound twice, the
	// resulting volumes will both have the same name to prevent the same Volume from being attached
	// to a pod twice. The names of the returned volumes will be a short random string starting "ws-".
	func CreateVolumes(wb []v1.WorkspaceBinding) map[string]corev1.Volume {
	pvcs := map[string]corev1.Volume{}
	v := make(nameVolumeMap)
	for _, w := range wb {
	name := names.SimpleNameGenerator.RestrictLengthWithRandomSuffix(volumeNameBase)
	switch {
	case w.PersistentVolumeClaim != nil:
	// If it's a PVC, we need to check if we've encountered it before so we avoid mounting it twice
	if vv, ok := pvcs[w.PersistentVolumeClaim.ClaimName]; ok {
	v[w.Name] = vv
	} else {
	pvc := *w.PersistentVolumeClaim
	v.setVolumeSource(w.Name, name, corev1.VolumeSource{PersistentVolumeClaim: &pvc})
	pvcs[pvc.ClaimName] = v[w.Name]
	}
	case w.EmptyDir != nil:
	ed := *w.EmptyDir
	v.setVolumeSource(w.Name, name, corev1.VolumeSource{EmptyDir: &ed})
	case w.ConfigMap != nil:
	cm := *w.ConfigMap
	v.setVolumeSource(w.Name, name, corev1.VolumeSource{ConfigMap: &cm})
	case w.Secret != nil:
	s := *w.Secret
	v.setVolumeSource(w.Name, name, corev1.VolumeSource{Secret: &s})
	case w.Projected != nil:
	s := *w.Projected
	v.setVolumeSource(w.Name, name, corev1.VolumeSource{Projected: &s})
	case w.CSI != nil:
	csi := *w.CSI
	v.setVolumeSource(w.Name, name, corev1.VolumeSource{CSI: &csi})
	}
	}
	return v
	}

	// RestrictLengthWithRandomSuffix takes a base name and returns a potentially shortened version of that name with
	// a random suffix, with the whole string no longer than 63 characters.
	func (simpleNameGenerator) RestrictLengthWithRandomSuffix(base string) string {
	if len(base) > maxGeneratedNameLength {
	base = base[:maxGeneratedNameLength]
	}
	return fmt.Sprintf("%s-%s", base, utilrand.String(randomLength))
	}

	// applyParamsContextsResultsAndWorkspaces applies paramater, context, results and workspace substitutions to the TaskSpec.
	func applyParamsContextsResultsAndWorkspaces(ctx context.Context, tr v1.TaskRun, rtr resources.ResolvedTask, workspaceVolumes map[string]corev1.Volume) (*v1.TaskSpec, error) {
	ts := rtr.TaskSpec.DeepCopy()
	var defaults []v1.ParamSpec
	if len(ts.Params) > 0 {
	defaults = append(defaults, ts.Params...)
	}
	// Apply parameter substitution from the taskrun.
	ts = resources.ApplyParameters(ts, tr, defaults...)

	// Apply context substitution from the taskrun
	ts = resources.ApplyContexts(ts, rtr.TaskName, tr)

	// Apply task result substitution
	ts = resources.ApplyResults(ts)

	// Apply step exitCode path substitution
	ts = resources.ApplyStepExitCodePath(ts)

	// Apply workspace resource substitution
	// propagate workspaces from taskrun to task.
	twn := []string{}
	for _, tw := range ts.Workspaces {
	twn = append(twn, tw.Name)
	}

	for _, trw := range tr.Spec.Workspaces {
	skip := false
	for _, tw := range twn {
	if tw == trw.Name {
	skip = true
	break
	}
	}
	if !skip {
	ts.Workspaces = append(ts.Workspaces, v1.WorkspaceDeclaration{Name: trw.Name})
	}
	}
	ts = resources.ApplyWorkspaces(ctx, ts, ts.Workspaces, tr.Spec.Workspaces, workspaceVolumes)

	return ts, nil
	}

	// ApplyWorkspaces applies the substitution from paths that the workspaces in declarations mounted to, the
	// volumes that bindings are realized with in the task spec and the PersistentVolumeClaim names for the
	// workspaces.
	func ApplyWorkspaces(ctx context.Context, spec v1.TaskSpec, declarations []v1.WorkspaceDeclaration, bindings []v1.WorkspaceBinding, vols map[string]corev1.Volume) v1.TaskSpec {
	stringReplacements := map[string]string{}

	bindNames := sets.NewString()
	for _, binding := range bindings {
	bindNames.Insert(binding.Name)
	}

	for _, declaration := range declarations {
	prefix := fmt.Sprintf("workspaces.%s.", declaration.Name)
	if declaration.Optional && !bindNames.Has(declaration.Name) {
	stringReplacements[prefix+"bound"] = "false"
	stringReplacements[prefix+"path"] = ""
	} else {
	stringReplacements[prefix+"bound"] = "true"
	spec = applyWorkspaceMountPath(prefix+"path", spec, declaration)
	}
	}

	for name, vol := range vols {
	stringReplacements[fmt.Sprintf("workspaces.%s.volume", name)] = vol.Name
	}
	for _, binding := range bindings {
	if binding.PersistentVolumeClaim != nil {
	stringReplacements[fmt.Sprintf("workspaces.%s.claim", binding.Name)] = binding.PersistentVolumeClaim.ClaimName
	} else {
	stringReplacements[fmt.Sprintf("workspaces.%s.claim", binding.Name)] = ""
	}
	}
	return ApplyReplacements(spec, stringReplacements, map[string][]string{}, map[string]map[string]string{})
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pod creation issue on Task retry with `workspace.<name>.volume` settings #7886

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info

Analysis

0. Summary

1. reconcile -> CreateVolumes && applyParamsContextsResultsAndWorkspaces && createPod

2.1 CreateVolumes -> RestrictLengthWithRandomSuffix

2.2 RestrictLengthWithRandomSuffix -> random string

3.1 applyParamsContextsResultsAndWorkspaces -> ApplyWorkspaces

3.2 ApplyWorkspaces -> ApplyReplacements

3.3 `workspaces.%s.volume`

4. Set replaced taskSpec to TaskRun Status

5. reconcile -> prepare -> GetTaskFuncFromTaskRun

Solution

1. The content stored in taskRun.status.taskSpec is the original content of taskSpec, rather than the values after variable substitution.

2. Each time the taskSpec definition is acquired, it is fetched from the remote end, rather than reusing the one in the status.

3. Change the method of generating volume names

4. Remove support for `workspace.<name>.volume`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	// GetTaskFuncFromTaskRun is a factory function that will use the given TaskRef as context to return a valid GetTask function.
	// It also requires a kubeclient, tektonclient, namespace, and service account in case it needs to find that task in
	// cluster or authorize against an external repositroy. It will figure out whether it needs to look in the cluster or in
	// a remote image to fetch the reference. It will also return the "kind" of the task being referenced.
	// OCI bundle and remote resolution tasks will be verified by trusted resources if the feature is enabled
	func GetTaskFuncFromTaskRun(ctx context.Context, k8s kubernetes.Interface, tekton clientset.Interface, requester remoteresource.Requester, taskrun v1.TaskRun, verificationPolicies []v1alpha1.VerificationPolicy) GetTask {
	// if the spec is already in the status, do not try to fetch it again, just use it as source of truth.
	// Same for the RefSource field in the Status.Provenance.
	if taskrun.Status.TaskSpec != nil {
	return func(_ context.Context, name string) (v1.Task, v1.RefSource, *trustedresources.VerificationResult, error) {
	var refSource *v1.RefSource
	if taskrun.Status.Provenance != nil {
	refSource = taskrun.Status.Provenance.RefSource
	}
	return &v1.Task{
	ObjectMeta: metav1.ObjectMeta{
	Name: name,
	Namespace: taskrun.Namespace,
	},
	Spec: *taskrun.Status.TaskSpec,
	}, refSource, nil, nil
	}
	}
	return GetTaskFunc(ctx, k8s, tekton, requester, taskrun, taskrun.Spec.TaskRef, taskrun.Name, taskrun.Namespace, taskrun.Spec.ServiceAccountName, verificationPolicies)
	}

Pod creation issue on Task retry with workspace.<name>.volume settings #7886

Description

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info

Analysis

0. Summary

1. reconcile -> CreateVolumes && applyParamsContextsResultsAndWorkspaces && createPod

2.1 CreateVolumes -> RestrictLengthWithRandomSuffix

2.2 RestrictLengthWithRandomSuffix -> random string

3.1 applyParamsContextsResultsAndWorkspaces -> ApplyWorkspaces

3.2 ApplyWorkspaces -> ApplyReplacements

3.3 workspaces.%s.volume

4. Set replaced taskSpec to TaskRun Status

5. reconcile -> prepare -> GetTaskFuncFromTaskRun

Solution

1. The content stored in taskRun.status.taskSpec is the original content of taskSpec, rather than the values after variable substitution.

2. Each time the taskSpec definition is acquired, it is fetched from the remote end, rather than reusing the one in the status.

3. Change the method of generating volume names

4. Remove support for workspace.<name>.volume

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Pod creation issue on Task retry with `workspace.<name>.volume` settings #7886

3.3 `workspaces.%s.volume`

4. Remove support for `workspace.<name>.volume`