GCU generates suboptimal plan for CreateOnly paths#4335
GCU generates suboptimal plan for CreateOnly paths#4335lguohan merged 4 commits intosonic-net:masterfrom
Conversation
When GCU hits a CreateOnly entry that has changed, it generates a suboptimal
plan. One example is a simple change of:
```
[{"op": "replace", "path": "/MIRROR_SESSION/EVERFLOW_TUNNEL/dst_ip", "value": "200.1.1.203"}]
```
Should generate an optimal plan of:
```
[
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION"}],
[{"op": "remove", "path": "/MIRROR_SESSION"}],
[{"op": "add", "path": "/MIRROR_SESSION", "value": {"EVERFLOW_TUNNEL": {"dscp": "8", "dst_ip": "200.1.1.203", "src_ip": "100.1.1.1", "ttl": "255", "type": "ERSPAN"}}}]
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION", "value": "EVERFLOW_TUNNEL"}]
]
```
But instead generates this plan (which removes all ACLs):
```
[
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION"}],
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1", "value": {"PRIORITY": "1000"}}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION", "value": "EVERFLOW_TUNNEL"}],
[{"op": "remove", "path": "/ACL_RULE"}],
[{"op": "add", "path": "/ACL_RULE", "value": {"EVERFLOW|RULE_1": {"PRIORITY": "1000", "IP_TYPE": "IP", "MIRROR_INGRESS_ACTION": "EVERFLOW_TUNNEL"}}}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1", "value": {"PRIORITY": "10"}}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/DST_IP", "value": "192.168.1.1/32"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/IP_TYPE", "value": "IP"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/L4_DST_PORT", "value": "22"}],
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION"}],
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1", "value": {"PRIORITY": "1000"}}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION", "value": "EVERFLOW_TUNNEL"}],
[{"op": "remove", "path": "/ACL_RULE/DATAACL|RULE_1"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1", "value": {"PRIORITY": "10"}}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/DST_IP", "value": "192.168.1.1/32"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/IP_TYPE", "value": "IP"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/PACKET_ACTION", "value": "DROP"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/IP_TYPE", "value": "IP"}],
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION"}],
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1", "value": {"PRIORITY": "1000"}}],
[{"op": "remove", "path": "/ACL_RULE/DATAACL|RULE_1"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/IP_TYPE", "value": "IP"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1", "value": {"PRIORITY": "10"}}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/DST_IP", "value": "192.168.1.1/32"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/IP_TYPE", "value": "IP"}],
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1", "value": {"PRIORITY": "1000"}}],
[{"op": "remove", "path": "/MIRROR_SESSION"}],
[{"op": "add", "path": "/MIRROR_SESSION", "value": {"EVERFLOW_TUNNEL": {"dscp": "8", "dst_ip": "200.1.1.203", "src_ip": "100.1.1.1", "ttl": "255", "type": "ERSPAN"}}}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/L4_DST_PORT", "value": "22"}, {"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/PACKET_ACTION", "value": "DROP"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/IP_TYPE", "value": "IP"}, {"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION", "value": "EVERFLOW_TUNNEL"}]
]
```
Modified`RemoveCreateOnlyDependencyMoveGenerator`:
* it would previously short-circuit early due to only processing one child
leaf in the same table.
* it would previously attempt to iterate across all members of the table even
though there was a complete path list.
* it was missing logic to remove the create only path itself (and was relying
on extenders to do that which was inefficient and wouldn't generate the
right plan)
* when removing dependents it wasn't recursing to ensure it would remove
dependents of dependents
Since this generator is now full and doesn't rely on any extenders, it has
been moved to a non-extendable generator.
These changes caused some existing (suboptimal) plans that got generated to
change so those test cases have also been updated.
Added test case to validate this behavior and ensure it does not regress.
Signed-off-by: Brad House <bhouse@nexthop.ai>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR improves Generic Config Updater (GCU) patch planning when a create-only field changes (e.g., MIRROR_SESSION leaf updates), aiming to avoid generating unnecessarily disruptive plans (like removing unrelated ACL configuration).
Changes:
- Reworked
RemoveCreateOnlyDependencyMoveGeneratorto remove dependent paths (including transitive dependencies) and remove the create-only object itself, and moved it to the non-extendable generator list. - Updated existing patch sorter tests/fixtures to reflect the new move generation behavior.
- Added a new fixture test case covering the mirror-session create-only update scenario to prevent regressions.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
generic_config_updater/patch_sorter.py |
Updates create-only dependency removal generation and reorders generators (non-extendable vs extendable). |
tests/generic_config_updater/patch_sorter_test.py |
Adjusts unit test expectations for the updated move generator ordering/output. |
tests/generic_config_updater/files/patch_sorter_test_success.json |
Updates expected plan outputs and adds a new success case for MIRROR_SESSION create-only leaf replacement. |
|
/azp run |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Prevent possible infinite loop scenario Copilot identified, however it shouldn't be possible given self.__get_path_count() can't return 1 in that scenario to allow the loop to continue. But hardening isn't a bad practice. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Brad House <bhouse@nexthop.ai>
Fix spelling / gramatical error caught by Copilot. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Brad House <bhouse@nexthop.ai>
eb358ac to
9f02958
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
securely1g
left a comment
There was a problem hiding this comment.
Code Review
What it does: Fixes RemoveCreateOnlyDependencyMoveGenerator in GCU's patch sorter. When a CreateOnly field changes (e.g., MIRROR_SESSION dst_ip), the old code generated a bloated plan that unnecessarily removed/re-added unrelated ACL rules. The fix produces a minimal plan: remove dependent → remove CreateOnly object → re-add with new value → re-add dependent.
The good:
- Root cause is solid — the old code had multiple bugs: short-circuiting after one child leaf per table (
processed_tablesset), not removing the CreateOnly path itself, and not recursing through dependency chains - The new
__remove_dependentsproperly recurses through dependency chains (dependent → dependent-of-dependent) __remove_nonemptycollapses up to parent when a table would be left empty — avoids orphan empty tables in ConfigDB- Moving
RemoveCreateOnlyDependencyMoveGeneratorfrommove_generators(extendable) tomove_non_extendable_generatorsmakes sense — the old placement let extenders add redundant moves - New MIRROR_SESSION test case directly validates the motivating example
Concerns:
-
Unbounded recursion in
__remove_dependents— if there's ever a circular dependency in the YANG model refs, this will stack overflow. Low probability but worth avisitedset or max depth guard. -
Duplicate moves — the generator yields
__remove_nonemptytwice and__remove_dependentstwice (withremove_parent=FalsethenTrue). The test comments acknowledge this ("we see the same output twice"). While the DFS solver presumably handles idempotent removes, it's not clean. Could filter dupes before yielding. -
Hard-coded
tokens[0], tokens[1], tokens[2]— assumes CreateOnly paths are always exactly 3 levels deep (TABLE/MEMBER/FIELD). If any YANG model has a CreateOnly leaf at a different depth, this willIndexError. The old code usedtokens[0]andtokens[-1]which was more flexible (though buggy). Worth an assertion or len check. -
__get_path_countcan raiseKeyError— if any token in the chain doesn't exist in config (race condition during plan generation or partial config), it'll throw unhandled. -
Test expectations look fragile — the DPB 4-to-1 test now expects
PORT/Ethernet0removed twice in the move list. Works, but makes the test harder to reason about.
Verdict: The core fix is correct and the MIRROR_SESSION example proves it generates optimal plans. The recursion and duplicate-move concerns are worth addressing but not blockers. Would suggest adding a recursion depth guard in __remove_dependents.
|
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@securely1g @lguohan all review comments have been replied to, but I also created a new commit that actually addresses them even though in the calling paths none of the issues were possible. |
953013b to
074c649
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
074c649 to
77e3722
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
thank you, i approved. can you check why build failing? |
77e3722 to
cfedcbb
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
test case output due to ordering change in one of the updates. I'll get it fixed here in a sec. |
cfedcbb to
d40647a
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
d40647a to
0c60a8f
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
1. Add recursion depth protection in removing dependents because of a concern of recursive dependencies, which isn't actually possible with yang. None-the-less, implemented. 2. Remove a duplicate move that gets generated as when using it with a Depth-first sorter its not necessary though could be on other sorters. 3. The old code depended on 3 levels of depth for the create only leafs. Reworked the logic to not be dependent on depth. 4. __get_path_count() can no longer return a KeyError even though the caller paths would make that impossible, but future use cases may need it to not throw an exception when the path is not found. Signed-off-by: Brad House <bhouse@nexthop.ai>
0c60a8f to
08fda10
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
What I did
When GCU hits a CreateOnly entry that has changed, it generates a suboptimal plan. One example is a simple change of:
Should generate an optimal plan of:
But instead generates this plan (which removes all ACLs):
How I did it
Modified
RemoveCreateOnlyDependencyMoveGenerator:Since this generator is now full and doesn't rely on any extenders, it has been moved to a non-extendable generator.
How to verify it
These changes caused some existing (suboptimal) plans that got generated to change so those test cases have also been updated.
Added test case to validate this behavior and ensure it does not regress.
Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)
Potentially also fixes these issues: