Skip to content

StatefulSet recreation during PVC resize loses kubectl.kubernetes.io/restartedAt annotation #2169

@conan912

Description

@conan912

Report

StatefulSet recreation during PVC resize loses kubectl.kubernetes.io/restartedAt annotation, causing revision hash mismatch and triggering unnecessary SmartUpdate operations.

More about the problem

Hello. When operating psmdb in an Operator environment, we often need to manually restart Pods.
(EKS node upgrades / PSMDB Operator version upgrades, etc.)

While the safe way to restart according to https://docs.percona.com/percona-operator-for-mongodb/pause.html is to use the Pause feature,
after reviewing the code, it appears that this method reduces the Replica Size to 0, which would significantly impact the service.

Therefore, we set updateStrategy: SmartUpdate and use the kubectl rollout restart command.

The problem is that when this command is executed, the StatefulSet's Template Annotation gets a value in the pattern kubectl.kubernetes.io/restartedAt: 'yyyy-mm-ddThh:mm:ssZ', which changes the StatefulSet Template Hash, causing the Current revision Hash and Update revision Hash to differ. (The restarted Pods also have the Update revision hash value in their controller-revision-hash.)

As you may already know, when the StatefulSet's Template Hash changes, there is a possibility that restarts may occur during certain operations.

For example, if we triggered a restart due to infrastructure work and changed the Template Hash,
and then perform a PVC resize in the CR file after applying the enableVolumeExpansion: true option,
the StatefulSet is Orphan Deleted in the final step.

When the StatefulSet is recreated during reconciliation,
it doesn't have the Hash value that was updated by the previous rollout restart, but rather the original Hash value (without considering the restart annotation).

As a result, the StatefulSet's Update revision Hash differs from the Pod's controller-revision-hash, triggering SmartUpdate.

>> PVC Resize operations should not cause Pod restarts, but restarts are occurring.

Steps to reproduce

  1. Set updateStrategy: SmartUpdate in PerconaServerMongoDB CR
  2. Execute kubectl rollout restart statefulset/<statefulset-name> to restart Pods
    • At this point, kubectl.kubernetes.io/restartedAt annotation is added to StatefulSet Template
    • StatefulSet's Update revision hash changes
  3. Set enableVolumeExpansion: true in CR
  4. Trigger PVC resize by increasing PVC size in CR
  5. After PVC resize completes, StatefulSet is Orphan Deleted and recreated
  6. The recreated StatefulSet doesn't have the restartedAt annotation, so it has the original hash
  7. Pod's controller-revision-hash and StatefulSet's Update revision hash differ
  8. SmartUpdate is triggered, causing unnecessary Pod restarts

Versions

  1. Kubernetes : 1.31
  2. Operator : 1.20.1
  3. Database : Percona Server for MongoDB 7.0.24-13

Anything else?

I think a good solution would be to preserve the annotation that matches the existing revision hash when the StatefulSet is Orphan Deleted.

The following is an example of the code I modified, and after applying this code, I confirmed that smart updates do not occur when implementing the scenario described above.

If you think another approach would be better, please let me know. I would be happy to contribute this fix to the project.

Modified Code

1. pkg/apis/psmdb/v1/psmdb_types.go


const (
	AnnotationResyncPBM                = "percona.com/resync-pbm"
	AnnotationResyncInProgress         = "percona.com/resync-in-progress"
	AnnotationPVCResizeInProgress      = "percona.com/pvc-resize-in-progress"
	AnnotationPreservedRestartedAtBase = "percona.com/preserved-restarted-at"
)

// AnnotationPreservedRestartedAt returns the annotation key for preserving restartedAt annotation for a StatefulSet
func AnnotationPreservedRestartedAt(stsName string) string {
	return AnnotationPreservedRestartedAtBase + "." + stsName
}

2. pkg/controller/perconaservermongodb/volumes.go

Store `restartedAt` annotation in CR before deleting StatefulSet:

if updatedPVCs == len(pvcsToUpdate) {
	log.Info("Deleting statefulset")

	if restartedAtValue, exists := sts.Spec.Template.Annotations["kubectl.kubernetes.io/restartedAt"]; exists {
		_ = k8s.AnnotateObject(ctx, r.client, cr, map[string]string{psmdbv1.AnnotationPreservedRestartedAt(sts.Name): restartedAtValue})
	}

	if err := r.client.Delete(ctx, sts, client.PropagationPolicy("Orphan")); err != nil {
		if k8serrors.IsNotFound(err) {
			return nil
		}
		return errors.Wrapf(err, "delete statefulset/%s", sts.Name)
	}

	log.Info("PVC resize completed")
	return nil
}

3. pkg/controller/perconaservermongodb/statefulset.go

Restore the annotation stored in CR when recreating StatefulSet:

sslAnn, err := r.sslAnnotation(ctx, cr)
if err != nil {
	return nil, errors.Wrap(err, "failed to get ssl annotations")
}
for k, v := range sslAnn {
	sfsSpec.Template.Annotations[k] = v
}

if preservedValue, exists := cr.Annotations[api.AnnotationPreservedRestartedAt(sfs.Name)]; exists {
	if sfsSpec.Template.Annotations == nil {
		sfsSpec.Template.Annotations = make(map[string]string)
	}
	sfsSpec.Template.Annotations["kubectl.kubernetes.io/restartedAt"] = preservedValue
}

return sfs, nil

How it works

  1. Before deleting StatefulSet after PVC resize completes, if restartedAt annotation exists, store it in CR
  2. StatefulSet is Orphan Deleted
  3. When StatefulSet is recreated, check and restore the annotation stored in CR
  4. The same revision hash is maintained, preventing SmartUpdate from being triggered

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions