fix: skip NodeEvaluation upsert when evaluation did not run#218
fix: skip NodeEvaluation upsert when evaluation did not run#218sahitya-chandra wants to merge 2 commits intokubernetes-sigs:mainfrom
Conversation
processNodeAgainstAllRules unconditionally wrote a NodeEvaluation entry for the node it just processed, even when evaluateRuleForNode returned an error before updateNodeEvaluationStatus could populate one. The zero-value NodeEvaluation either clobbered a valid prior entry or appended one with an empty NodeName. The CRD requires NodeName MinLength=1, so the API server rejected the whole status patch with 422, and the FailedNodes update bundled into the same patch was lost along with it. Skip the upsert when the in-memory rule has no evaluation for this node, and let the FailedNodes update through on its own. Add a regression test that fails Patch on the node, asserts FailedNodes is recorded, and asserts no empty NodeEvaluation slips into status.
✅ Deploy Preview for node-readiness-controller canceled.
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sahitya-chandra The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @sahitya-chandra. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Description
processNodeAgainstAllRulesunconditionally upserted aNodeEvaluationentry for the current node afterevaluateRuleForNode, even on the failure path. When taint patching returned a non-conflict error (e.g. RBAC denial, persistent API failure),evaluateRuleForNodeexited beforeupdateNodeEvaluationStatusran, so the in-memory rule had no evaluation for that node and the upsert wrote a zero-valueNodeEvaluation{NodeName: ""}.NodeEvaluation.NodeNameis annotatedMinLength=1plus a hostnamePattern, so the API server rejected the wholeStatus().Patchwith 422, and theFailedNodesupdate bundled into the same patch was lost with itThis change captures the error from
evaluateRuleForNodeand only runs theNodeEvaluationupsert when the evaluation succeeded. On the failure path the persisted entry is left untouched and onlyFailedNodesis updated. This also avoids overwriting a fresh persisted entry with a stale one from the rule cache when a separate reconcile has updated status since the cache was last refreshedRelated Issue
Fixes #217
Type of Change
/kind bug
Testing
make testpasses locally (envtest, Kubernetes 1.34); controller package coverage rose from 72.2% to 74.5%.make lintpasses locallyPatchon the node, runsNodeReconciler.Reconcile, and asserts that theFailedNodesentry lands, no emptyNodeEvaluationslipped in, and an unrelated pre-existingNodeEvaluationwas preserved. Without the fix, the test fails on the empty-NodeNameassertionTaintStatus=Absent, the cached rule snapshot has staleTaintStatus=Present, evaluation fails, and the persisted entry must stayAbsentChecklist
make testpassesmake lintpassesDoes this PR introduce a user-facing change?