Skip to content

K8SPSMDB-1571: Make reconciliation interval configurable#2221

Open
adutra wants to merge 4 commits intopercona:mainfrom
adutra:configurable-reconcile-interval
Open

K8SPSMDB-1571: Make reconciliation interval configurable#2221
adutra wants to merge 4 commits intopercona:mainfrom
adutra:configurable-reconcile-interval

Conversation

@adutra
Copy link

@adutra adutra commented Jan 31, 2026

Due to the high volume of requests, we're unable to provide free service for this account. To continue using the service, please upgarde to a paid plan.

CHANGE DESCRIPTION

Problem:

The Percona MongoDB Operator has a hardcoded 5-second reconciliation interval that generates excessive Kubernetes API requests (~12/second, ~42,000/hour per cluster). This should be configurable via environment variable to allow tuning for different deployment environments.

Cause:

The operator's reconciliation interval is hardcoded in psmdb_controller.go.

Solution:

Make interval configurable via a new environment variable RECONCILE_INTERVAL.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?
  • Are OpenShift compare files changed for E2E tests (compare/*-oc.yml)?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported MongoDB version?
  • Does the change support oldest and newest supported Kubernetes version?

@adutra adutra requested a review from hors as a code owner January 31, 2026 16:23
Copilot AI review requested due to automatic review settings January 31, 2026 16:23
@pull-request-size pull-request-size bot added the size/M 30-99 lines label Jan 31, 2026
@CLAassistant
Copy link

CLAassistant commented Jan 31, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes the Percona MongoDB Operator’s reconciliation interval configurable via a new RECONCILE_INTERVAL environment variable instead of being hardcoded to 5 seconds.

Changes:

  • Introduced getReconcileInterval() to derive the reconcile interval from the RECONCILE_INTERVAL environment variable, defaulting to 5s.
  • Updated newReconciler to use getReconcileInterval() and added unit tests for the new helper.
  • Extended deploy/operator.yaml and deploy/cw-operator.yaml to set the RECONCILE_INTERVAL env var with a default value of 5s.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
pkg/controller/perconaservermongodb/psmdb_controller.go Replaces the hardcoded reconcileIn duration with getReconcileInterval() and defines the helper that reads RECONCILE_INTERVAL from the environment.
pkg/controller/perconaservermongodb/psmdb_controller_test.go Adds unit tests covering unset, valid, and invalid RECONCILE_INTERVAL env var values.
deploy/operator.yaml Wires the new RECONCILE_INTERVAL environment variable into the standard operator deployment manifest.
deploy/cw-operator.yaml Wires the new RECONCILE_INTERVAL environment variable into the CloudWatch-oriented operator deployment manifest.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 145 to 150
if interval := os.Getenv("RECONCILE_INTERVAL"); interval != "" {
if d, err := time.ParseDuration(interval); err == nil {
return d
}
}
return defaultInterval
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getReconcileInterval accepts any duration that time.ParseDuration parses, including zero or negative values, and silently falls back to the default when parsing fails. Since the value is fed directly into reconcile.Result{RequeueAfter: r.reconcileIn}, a non-positive or malformed RECONCILE_INTERVAL can lead to confusing or unintended reconcile behavior with no log signal for operators. Consider validating that the parsed duration is strictly positive (and optionally above some minimum) before using it, and logging a warning when the env var is invalid or rejected so misconfiguration is easier to detect.

Suggested change
if interval := os.Getenv("RECONCILE_INTERVAL"); interval != "" {
if d, err := time.ParseDuration(interval); err == nil {
return d
}
}
return defaultInterval
intervalStr := os.Getenv("RECONCILE_INTERVAL")
if intervalStr == "" {
return defaultInterval
}
d, err := time.ParseDuration(intervalStr)
if err != nil {
logf.Log.WithName("psmdb-controller").Info(
"Invalid RECONCILE_INTERVAL value, using default",
"value", intervalStr,
"error", err,
"default", defaultInterval,
)
return defaultInterval
}
if d <= 0 {
logf.Log.WithName("psmdb-controller").Info(
"Non-positive RECONCILE_INTERVAL value, using default",
"value", intervalStr,
"default", defaultInterval,
)
return defaultInterval
}
return d

Copilot uses AI. Check for mistakes.
@pull-request-size pull-request-size bot added size/L 100-499 lines and removed size/M 30-99 lines labels Jan 31, 2026
@egegunes egegunes changed the title feat: Make reconciliation interval configurable K8SPSMDB-1571: Make reconciliation interval configurable Feb 3, 2026
@egegunes egegunes added this to the v1.23.0 milestone Feb 3, 2026
egegunes
egegunes previously approved these changes Feb 3, 2026
Copy link
Contributor

@egegunes egegunes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @adutra thank you for your contribution.

We already finished the development for v1.22.0, this will go into v1.23.0

@adutra
Copy link
Author

adutra commented Feb 3, 2026

LGTM, @adutra thank you for your contribution.

We already finished the development for v1.22.0, this will go into v1.23.0

Thanks @egegunes ! Since you approved already, do you need me to fix the lint failures reported by CI?

@egegunes
Copy link
Contributor

egegunes commented Feb 3, 2026

Thanks @egegunes ! Since you approved already, do you need me to fix the lint failures reported by CI?

Yes, please fix golangci-lint and manifests

return defaultInterval
}

return d
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it makes sense to have a value 1second for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, i agree. maybe it's better to not allow values less than 5 seconds and fallback to 5 seconds if value is smaller


for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Save original env value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this comment is unneeded, wdyt?

}
}()

// Set test env value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for this one


got := getReconcileInterval()
if got != tt.want {
t.Errorf("getReconcileInterval() = %v, want %v", got, tt.want)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we use assertions, the codebase has examples e.g. assert.Error

t.Run(tt.name, func(t *testing.T) {
// Save original env value
originalValue, wasSet := os.LookupEnv("RECONCILE_INTERVAL")
defer func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all this logic is not needed, we can simply unsert the env after the end of each test.

We can do something like that:

			defer func() {
				err := os.Unsetenv("RECONCILE_INTERVAL")
				require.NoError(t, err)
			}()
			if tt.setEnv {
				err := os.Setenv("RECONCILE_INTERVAL", tt.envValue)
				require.NoError(t, err)
			}

Which is also handling the errors of setting and unsetting, wdyt?

Copilot AI review requested due to automatic review settings February 4, 2026 14:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +67 to +73
defer func() {
err := os.Unsetenv("RECONCILE_INTERVAL")
require.NoError(t, err)
}()
if tt.setEnv {
err := os.Setenv("RECONCILE_INTERVAL", tt.envValue)
require.NoError(t, err)
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test permanently unsets the RECONCILE_INTERVAL environment variable and does not restore any previously configured value, which can cause hidden dependencies or flakiness if other tests or tooling rely on that variable. Capture the original value at the start of the test case and restore it in the deferred cleanup instead of always calling Unsetenv.

Copilot uses AI. Check for mistakes.
@adutra
Copy link
Author

adutra commented Feb 5, 2026

Hi @egegunes @gkech I've made changes according to your feedback, but i'm not sure about the new CI failures. Could you help me fix those?

Copy link
Contributor

@egegunes egegunes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adutra those failures are not related to these changes, they will be addressed in another PR.

@JNKPercona
Copy link
Collaborator

Test Name Result Time
arbiter passed 00:11:19
balancer passed 00:18:51
cross-site-sharded passed 00:18:44
custom-replset-name passed 00:10:18
custom-tls passed 00:14:40
custom-users-roles passed 00:10:23
custom-users-roles-sharded passed 00:11:35
data-at-rest-encryption passed 00:12:28
data-sharded passed 00:23:14
demand-backup passed 00:15:45
demand-backup-eks-credentials-irsa passed 00:00:07
demand-backup-fs passed 00:24:30
demand-backup-if-unhealthy passed 00:11:45
demand-backup-incremental-aws passed 00:12:19
demand-backup-incremental-azure passed 00:11:45
demand-backup-incremental-gcp-native passed 00:11:59
demand-backup-incremental-gcp-s3 passed 00:10:37
demand-backup-incremental-minio passed 00:25:17
demand-backup-incremental-sharded-aws passed 00:18:51
demand-backup-incremental-sharded-azure passed 00:18:03
demand-backup-incremental-sharded-gcp-native passed 00:18:14
demand-backup-incremental-sharded-gcp-s3 passed 00:18:04
demand-backup-incremental-sharded-minio passed 00:27:11
demand-backup-physical-parallel passed 00:08:46
demand-backup-physical-aws passed 00:12:43
demand-backup-physical-azure passed 00:12:54
demand-backup-physical-gcp-s3 passed 00:11:54
demand-backup-physical-gcp-native passed 00:11:56
demand-backup-physical-minio passed 00:22:29
demand-backup-physical-minio-native passed 00:25:50
demand-backup-physical-minio-native-tls passed 00:19:55
demand-backup-physical-sharded-parallel passed 00:12:00
demand-backup-physical-sharded-aws passed 00:20:43
demand-backup-physical-sharded-azure passed 00:19:52
demand-backup-physical-sharded-gcp-native passed 00:18:29
demand-backup-physical-sharded-minio passed 00:17:42
demand-backup-physical-sharded-minio-native passed 00:18:15
demand-backup-sharded passed 00:25:43
disabled-auth passed 00:16:36
expose-sharded passed 00:34:01
finalizer passed 00:10:20
ignore-labels-annotations passed 00:07:47
init-deploy passed 00:13:19
ldap passed 00:08:58
ldap-tls passed 00:13:47
limits passed 00:06:27
liveness passed 00:09:02
mongod-major-upgrade passed 00:13:15
mongod-major-upgrade-sharded passed 00:21:14
monitoring-2-0 passed 00:25:21
monitoring-pmm3 passed 00:29:08
multi-cluster-service passed 00:13:51
multi-storage passed 00:18:36
non-voting-and-hidden passed 00:17:29
one-pod passed 00:08:23
operator-self-healing-chaos passed 00:12:58
pitr passed 00:31:59
pitr-physical passed 01:01:17
pitr-sharded passed 00:23:13
pitr-to-new-cluster passed 00:25:26
pitr-physical-backup-source passed 00:55:27
preinit-updates passed 00:05:14
pvc-auto-resize passed 00:14:22
pvc-resize passed 00:17:24
recover-no-primary passed 00:27:39
replset-overrides passed 00:17:55
replset-remapping passed 00:16:45
replset-remapping-sharded passed 00:16:57
rs-shard-migration passed 00:14:45
scaling passed 00:11:27
scheduled-backup passed 00:17:59
security-context passed 00:07:11
self-healing-chaos passed 00:15:12
service-per-pod passed 00:19:25
serviceless-external-nodes passed 00:07:28
smart-update passed 00:08:16
split-horizon passed 00:13:34
stable-resource-version passed 00:04:47
storage passed 00:07:51
tls-issue-cert-manager passed 00:29:37
unsafe-psa passed 00:07:55
upgrade passed 00:10:04
upgrade-consistency passed 00:07:56
upgrade-consistency-sharded-tls passed 00:55:36
upgrade-sharded passed 00:19:41
upgrade-partial-backup passed 00:16:05
users passed 00:17:49
users-vault passed 00:13:48
version-service passed 00:25:44
Summary Value
Tests Run 89/89
Job Duration 02:57:04
Total Test Time 25:42:10

commit: 4621f83
image: perconalab/percona-server-mongodb-operator:PR-2221-4621f83a

Copy link
Contributor

@egegunes egegunes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes to block this getting merged before we create the v1.22.0 release branch. I'll re-approve afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community size/L 100-499 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants