-
Notifications
You must be signed in to change notification settings - Fork 170
Description
Report
StatefulSet recreation during PVC resize loses kubectl.kubernetes.io/restartedAt annotation, causing revision hash mismatch and triggering unnecessary SmartUpdate operations.
More about the problem
Hello. When operating psmdb in an Operator environment, we often need to manually restart Pods.
(EKS node upgrades / PSMDB Operator version upgrades, etc.)
While the safe way to restart according to https://docs.percona.com/percona-operator-for-mongodb/pause.html is to use the Pause feature,
after reviewing the code, it appears that this method reduces the Replica Size to 0, which would significantly impact the service.
Therefore, we set updateStrategy: SmartUpdate and use the kubectl rollout restart command.
The problem is that when this command is executed, the StatefulSet's Template Annotation gets a value in the pattern kubectl.kubernetes.io/restartedAt: 'yyyy-mm-ddThh:mm:ssZ', which changes the StatefulSet Template Hash, causing the Current revision Hash and Update revision Hash to differ. (The restarted Pods also have the Update revision hash value in their controller-revision-hash.)
As you may already know, when the StatefulSet's Template Hash changes, there is a possibility that restarts may occur during certain operations.
For example, if we triggered a restart due to infrastructure work and changed the Template Hash,
and then perform a PVC resize in the CR file after applying the enableVolumeExpansion: true option,
the StatefulSet is Orphan Deleted in the final step.
When the StatefulSet is recreated during reconciliation,
it doesn't have the Hash value that was updated by the previous rollout restart, but rather the original Hash value (without considering the restart annotation).
As a result, the StatefulSet's Update revision Hash differs from the Pod's controller-revision-hash, triggering SmartUpdate.
>> PVC Resize operations should not cause Pod restarts, but restarts are occurring.
Steps to reproduce
- Set
updateStrategy: SmartUpdatein PerconaServerMongoDB CR - Execute
kubectl rollout restart statefulset/<statefulset-name>to restart Pods- At this point,
kubectl.kubernetes.io/restartedAtannotation is added to StatefulSet Template - StatefulSet's Update revision hash changes
- At this point,
- Set
enableVolumeExpansion: truein CR - Trigger PVC resize by increasing PVC size in CR
- After PVC resize completes, StatefulSet is Orphan Deleted and recreated
- The recreated StatefulSet doesn't have the
restartedAtannotation, so it has the original hash - Pod's controller-revision-hash and StatefulSet's Update revision hash differ
- SmartUpdate is triggered, causing unnecessary Pod restarts
Versions
- Kubernetes : 1.31
- Operator : 1.20.1
- Database : Percona Server for MongoDB 7.0.24-13
Anything else?
I think a good solution would be to preserve the annotation that matches the existing revision hash when the StatefulSet is Orphan Deleted.
The following is an example of the code I modified, and after applying this code, I confirmed that smart updates do not occur when implementing the scenario described above.
If you think another approach would be better, please let me know. I would be happy to contribute this fix to the project.
Modified Code
1. pkg/apis/psmdb/v1/psmdb_types.go
const (
AnnotationResyncPBM = "percona.com/resync-pbm"
AnnotationResyncInProgress = "percona.com/resync-in-progress"
AnnotationPVCResizeInProgress = "percona.com/pvc-resize-in-progress"
AnnotationPreservedRestartedAtBase = "percona.com/preserved-restarted-at"
)
// AnnotationPreservedRestartedAt returns the annotation key for preserving restartedAt annotation for a StatefulSet
func AnnotationPreservedRestartedAt(stsName string) string {
return AnnotationPreservedRestartedAtBase + "." + stsName
}
2. pkg/controller/perconaservermongodb/volumes.go
Store `restartedAt` annotation in CR before deleting StatefulSet:
if updatedPVCs == len(pvcsToUpdate) {
log.Info("Deleting statefulset")
if restartedAtValue, exists := sts.Spec.Template.Annotations["kubectl.kubernetes.io/restartedAt"]; exists {
_ = k8s.AnnotateObject(ctx, r.client, cr, map[string]string{psmdbv1.AnnotationPreservedRestartedAt(sts.Name): restartedAtValue})
}
if err := r.client.Delete(ctx, sts, client.PropagationPolicy("Orphan")); err != nil {
if k8serrors.IsNotFound(err) {
return nil
}
return errors.Wrapf(err, "delete statefulset/%s", sts.Name)
}
log.Info("PVC resize completed")
return nil
}
3. pkg/controller/perconaservermongodb/statefulset.go
Restore the annotation stored in CR when recreating StatefulSet:
sslAnn, err := r.sslAnnotation(ctx, cr)
if err != nil {
return nil, errors.Wrap(err, "failed to get ssl annotations")
}
for k, v := range sslAnn {
sfsSpec.Template.Annotations[k] = v
}
if preservedValue, exists := cr.Annotations[api.AnnotationPreservedRestartedAt(sfs.Name)]; exists {
if sfsSpec.Template.Annotations == nil {
sfsSpec.Template.Annotations = make(map[string]string)
}
sfsSpec.Template.Annotations["kubectl.kubernetes.io/restartedAt"] = preservedValue
}
return sfs, nil
How it works
- Before deleting StatefulSet after PVC resize completes, if
restartedAtannotation exists, store it in CR - StatefulSet is Orphan Deleted
- When StatefulSet is recreated, check and restore the annotation stored in CR
- The same revision hash is maintained, preventing SmartUpdate from being triggered
Thank you.