Skip to content

GCU generates suboptimal plan for CreateOnly paths#4335

Merged
lguohan merged 4 commits intosonic-net:masterfrom
bhouse-nexthop:createonly-path-fix
Mar 7, 2026
Merged

GCU generates suboptimal plan for CreateOnly paths#4335
lguohan merged 4 commits intosonic-net:masterfrom
bhouse-nexthop:createonly-path-fix

Conversation

@bhouse-nexthop
Copy link
Contributor

@bhouse-nexthop bhouse-nexthop commented Mar 6, 2026

What I did

When GCU hits a CreateOnly entry that has changed, it generates a suboptimal plan. One example is a simple change of:

[{"op": "replace", "path": "/MIRROR_SESSION/EVERFLOW_TUNNEL/dst_ip", "value": "200.1.1.203"}]

Should generate an optimal plan of:

[
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION"}],
[{"op": "remove", "path": "/MIRROR_SESSION"}],
[{"op": "add", "path": "/MIRROR_SESSION", "value": {"EVERFLOW_TUNNEL": {"dscp": "8", "dst_ip": "200.1.1.203", "src_ip": "100.1.1.1", "ttl": "255", "type": "ERSPAN"}}}]
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION", "value": "EVERFLOW_TUNNEL"}]
]

But instead generates this plan (which removes all ACLs):

[
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION"}],
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1", "value": {"PRIORITY": "1000"}}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION", "value": "EVERFLOW_TUNNEL"}],
[{"op": "remove", "path": "/ACL_RULE"}],
[{"op": "add", "path": "/ACL_RULE", "value": {"EVERFLOW|RULE_1": {"PRIORITY": "1000", "IP_TYPE": "IP", "MIRROR_INGRESS_ACTION": "EVERFLOW_TUNNEL"}}}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1", "value": {"PRIORITY": "10"}}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/DST_IP", "value": "192.168.1.1/32"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/IP_TYPE", "value": "IP"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/L4_DST_PORT", "value": "22"}],
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION"}],
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1", "value": {"PRIORITY": "1000"}}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION", "value": "EVERFLOW_TUNNEL"}],
[{"op": "remove", "path": "/ACL_RULE/DATAACL|RULE_1"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1", "value": {"PRIORITY": "10"}}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/DST_IP", "value": "192.168.1.1/32"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/IP_TYPE", "value": "IP"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/PACKET_ACTION", "value": "DROP"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/IP_TYPE", "value": "IP"}],
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION"}],
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1", "value": {"PRIORITY": "1000"}}],
[{"op": "remove", "path": "/ACL_RULE/DATAACL|RULE_1"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/IP_TYPE", "value": "IP"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1", "value": {"PRIORITY": "10"}}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/DST_IP", "value": "192.168.1.1/32"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/IP_TYPE", "value": "IP"}],
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1", "value": {"PRIORITY": "1000"}}],
[{"op": "remove", "path": "/MIRROR_SESSION"}],
[{"op": "add", "path": "/MIRROR_SESSION", "value": {"EVERFLOW_TUNNEL": {"dscp": "8", "dst_ip": "200.1.1.203", "src_ip": "100.1.1.1", "ttl": "255", "type": "ERSPAN"}}}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/L4_DST_PORT", "value": "22"}, {"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/PACKET_ACTION", "value": "DROP"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/IP_TYPE", "value": "IP"}, {"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION", "value": "EVERFLOW_TUNNEL"}]
]

How I did it

ModifiedRemoveCreateOnlyDependencyMoveGenerator:

  • it would previously short-circuit early due to only processing one child leaf in the same table.
  • it would previously attempt to iterate across all members of the table even though there was a complete path list.
  • it was missing logic to remove the create only path itself (and was relying on extenders to do that which was inefficient and wouldn't generate the right plan)
  • when removing dependents it wasn't recursing to ensure it would remove dependents of dependents

Since this generator is now full and doesn't rely on any extenders, it has been moved to a non-extendable generator.

How to verify it

These changes caused some existing (suboptimal) plans that got generated to change so those test cases have also been updated.

Added test case to validate this behavior and ensure it does not regress.

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

Potentially also fixes these issues:

When GCU hits a CreateOnly entry that has changed, it generates a suboptimal
plan.  One example is a simple change of:
```
[{"op": "replace", "path": "/MIRROR_SESSION/EVERFLOW_TUNNEL/dst_ip", "value": "200.1.1.203"}]
```

Should generate an optimal plan of:
```
[
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION"}],
[{"op": "remove", "path": "/MIRROR_SESSION"}],
[{"op": "add", "path": "/MIRROR_SESSION", "value": {"EVERFLOW_TUNNEL": {"dscp": "8", "dst_ip": "200.1.1.203", "src_ip": "100.1.1.1", "ttl": "255", "type": "ERSPAN"}}}]
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION", "value": "EVERFLOW_TUNNEL"}]
]
```

But instead generates this plan (which removes all ACLs):
```
[
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION"}],
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1", "value": {"PRIORITY": "1000"}}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION", "value": "EVERFLOW_TUNNEL"}],
[{"op": "remove", "path": "/ACL_RULE"}],
[{"op": "add", "path": "/ACL_RULE", "value": {"EVERFLOW|RULE_1": {"PRIORITY": "1000", "IP_TYPE": "IP", "MIRROR_INGRESS_ACTION": "EVERFLOW_TUNNEL"}}}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1", "value": {"PRIORITY": "10"}}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/DST_IP", "value": "192.168.1.1/32"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/IP_TYPE", "value": "IP"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/L4_DST_PORT", "value": "22"}],
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION"}],
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1", "value": {"PRIORITY": "1000"}}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION", "value": "EVERFLOW_TUNNEL"}],
[{"op": "remove", "path": "/ACL_RULE/DATAACL|RULE_1"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1", "value": {"PRIORITY": "10"}}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/DST_IP", "value": "192.168.1.1/32"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/IP_TYPE", "value": "IP"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/PACKET_ACTION", "value": "DROP"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/IP_TYPE", "value": "IP"}],
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION"}],
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1", "value": {"PRIORITY": "1000"}}],
[{"op": "remove", "path": "/ACL_RULE/DATAACL|RULE_1"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/IP_TYPE", "value": "IP"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1", "value": {"PRIORITY": "10"}}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/DST_IP", "value": "192.168.1.1/32"}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/IP_TYPE", "value": "IP"}],
[{"op": "remove", "path": "/ACL_RULE/EVERFLOW|RULE_1"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1", "value": {"PRIORITY": "1000"}}],
[{"op": "remove", "path": "/MIRROR_SESSION"}],
[{"op": "add", "path": "/MIRROR_SESSION", "value": {"EVERFLOW_TUNNEL": {"dscp": "8", "dst_ip": "200.1.1.203", "src_ip": "100.1.1.1", "ttl": "255", "type": "ERSPAN"}}}],
[{"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/L4_DST_PORT", "value": "22"}, {"op": "add", "path": "/ACL_RULE/DATAACL|RULE_1/PACKET_ACTION", "value": "DROP"}],
[{"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/IP_TYPE", "value": "IP"}, {"op": "add", "path": "/ACL_RULE/EVERFLOW|RULE_1/MIRROR_INGRESS_ACTION", "value": "EVERFLOW_TUNNEL"}]
]
```

Modified`RemoveCreateOnlyDependencyMoveGenerator`:
 * it would previously short-circuit early due to only processing one child
   leaf in the same table.
 * it would previously attempt to iterate across all members of the table even
   though there was a complete path list.
 * it was missing logic to remove the create only path itself (and was relying
   on extenders to do that which was inefficient and wouldn't generate the
   right plan)
 * when removing dependents it wasn't recursing to ensure it would remove
   dependents of dependents

Since this generator is now full and doesn't rely on any extenders, it has
been moved to a non-extendable generator.

These changes caused some existing (suboptimal) plans that got generated to
change so those test cases have also been updated.

Added test case to validate this behavior and ensure it does not regress.

Signed-off-by: Brad House <bhouse@nexthop.ai>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Generic Config Updater (GCU) patch planning when a create-only field changes (e.g., MIRROR_SESSION leaf updates), aiming to avoid generating unnecessarily disruptive plans (like removing unrelated ACL configuration).

Changes:

  • Reworked RemoveCreateOnlyDependencyMoveGenerator to remove dependent paths (including transitive dependencies) and remove the create-only object itself, and moved it to the non-extendable generator list.
  • Updated existing patch sorter tests/fixtures to reflect the new move generation behavior.
  • Added a new fixture test case covering the mirror-session create-only update scenario to prevent regressions.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
generic_config_updater/patch_sorter.py Updates create-only dependency removal generation and reorders generators (non-extendable vs extendable).
tests/generic_config_updater/patch_sorter_test.py Adjusts unit test expectations for the updated move generator ordering/output.
tests/generic_config_updater/files/patch_sorter_test_success.json Updates expected plan outputs and adds a new success case for MIRROR_SESSION create-only leaf replacement.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

bhouse-nexthop and others added 2 commits March 6, 2026 16:40
Prevent possible infinite loop scenario Copilot identified,
however it shouldn't be possible given self.__get_path_count()
can't return 1 in that scenario to allow the loop to continue.
But hardening isn't a bad practice.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Brad House <bhouse@nexthop.ai>
Fix spelling / gramatical error caught by Copilot.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Brad House <bhouse@nexthop.ai>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@securely1g securely1g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

What it does: Fixes RemoveCreateOnlyDependencyMoveGenerator in GCU's patch sorter. When a CreateOnly field changes (e.g., MIRROR_SESSION dst_ip), the old code generated a bloated plan that unnecessarily removed/re-added unrelated ACL rules. The fix produces a minimal plan: remove dependent → remove CreateOnly object → re-add with new value → re-add dependent.

The good:

  • Root cause is solid — the old code had multiple bugs: short-circuiting after one child leaf per table (processed_tables set), not removing the CreateOnly path itself, and not recursing through dependency chains
  • The new __remove_dependents properly recurses through dependency chains (dependent → dependent-of-dependent)
  • __remove_nonempty collapses up to parent when a table would be left empty — avoids orphan empty tables in ConfigDB
  • Moving RemoveCreateOnlyDependencyMoveGenerator from move_generators (extendable) to move_non_extendable_generators makes sense — the old placement let extenders add redundant moves
  • New MIRROR_SESSION test case directly validates the motivating example

Concerns:

  1. Unbounded recursion in __remove_dependents — if there's ever a circular dependency in the YANG model refs, this will stack overflow. Low probability but worth a visited set or max depth guard.

  2. Duplicate moves — the generator yields __remove_nonempty twice and __remove_dependents twice (with remove_parent=False then True). The test comments acknowledge this ("we see the same output twice"). While the DFS solver presumably handles idempotent removes, it's not clean. Could filter dupes before yielding.

  3. Hard-coded tokens[0], tokens[1], tokens[2] — assumes CreateOnly paths are always exactly 3 levels deep (TABLE/MEMBER/FIELD). If any YANG model has a CreateOnly leaf at a different depth, this will IndexError. The old code used tokens[0] and tokens[-1] which was more flexible (though buggy). Worth an assertion or len check.

  4. __get_path_count can raise KeyError — if any token in the chain doesn't exist in config (race condition during plan generation or partial config), it'll throw unhandled.

  5. Test expectations look fragile — the DPB 4-to-1 test now expects PORT/Ethernet0 removed twice in the move list. Works, but makes the test harder to reason about.

Verdict: The core fix is correct and the MIRROR_SESSION example proves it generates optimal plans. The recursion and duplicate-move concerns are worth addressing but not blockers. Would suggest adding a recursion depth guard in __remove_dependents.

@bradh352
Copy link
Contributor

bradh352 commented Mar 7, 2026

Code Review

What it does: Fixes RemoveCreateOnlyDependencyMoveGenerator in GCU's patch sorter. When a CreateOnly field changes (e.g., MIRROR_SESSION dst_ip), the old code generated a bloated plan that unnecessarily removed/re-added unrelated ACL rules. The fix produces a minimal plan: remove dependent → remove CreateOnly object → re-add with new value → re-add dependent.

The good:

  • Root cause is solid — the old code had multiple bugs: short-circuiting after one child leaf per table (processed_tables set), not removing the CreateOnly path itself, and not recursing through dependency chains
  • The new __remove_dependents properly recurses through dependency chains (dependent → dependent-of-dependent)
  • __remove_nonempty collapses up to parent when a table would be left empty — avoids orphan empty tables in ConfigDB
  • Moving RemoveCreateOnlyDependencyMoveGenerator from move_generators (extendable) to move_non_extendable_generators makes sense — the old placement let extenders add redundant moves
  • New MIRROR_SESSION test case directly validates the motivating example

Concerns:

  1. Unbounded recursion in __remove_dependents — if there's ever a circular dependency in the YANG model refs, this will stack overflow. Low probability but worth a visited set or max depth guard.
  2. Duplicate moves — the generator yields __remove_nonempty twice and __remove_dependents twice (with remove_parent=False then True). The test comments acknowledge this ("we see the same output twice"). While the DFS solver presumably handles idempotent removes, it's not clean. Could filter dupes before yielding.
  3. Hard-coded tokens[0], tokens[1], tokens[2] — assumes CreateOnly paths are always exactly 3 levels deep (TABLE/MEMBER/FIELD). If any YANG model has a CreateOnly leaf at a different depth, this will IndexError. The old code used tokens[0] and tokens[-1] which was more flexible (though buggy). Worth an assertion or len check.
  4. __get_path_count can raise KeyError — if any token in the chain doesn't exist in config (race condition during plan generation or partial config), it'll throw unhandled.
  5. Test expectations look fragile — the DPB 4-to-1 test now expects PORT/Ethernet0 removed twice in the move list. Works, but makes the test harder to reason about.

Verdict: The core fix is correct and the MIRROR_SESSION example proves it generates optimal plans. The recursion and duplicate-move concerns are worth addressing but not blockers. Would suggest adding a recursion depth guard in __remove_dependents.

  1. is not possible as a circular dependency in yang could never exist as nothing would be solvable.
  2. With the DFS sorter, its not necessary since its depth-first so the generator will just be called again from the beginning. That said, its not likely the other unused sorters would actually work these days.
  3. That's currently a guarantee of the CreateOnly path filter. I could make this more robust if the implementation were to change in the future if we need to.
  4. That is not possible in the limited scope of where this function is used. The path is known to be good. If it was used outside of this scope in the future, it could be an issue.
  5. The same comment as was added to dpb_1_to_4 applies to the 4-to-1 scenario. This is intended as explained. This is because of 2 above that PORT/Ethernet0 is removed twice.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bradh352
Copy link
Contributor

bradh352 commented Mar 7, 2026

@securely1g @lguohan all review comments have been replied to, but I also created a new commit that actually addresses them even though in the calling paths none of the issues were possible.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

lguohan
lguohan previously approved these changes Mar 7, 2026
@lguohan
Copy link
Contributor

lguohan commented Mar 7, 2026

thank you, i approved. can you check why build failing?

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bhouse-nexthop
Copy link
Contributor Author

test case output due to ordering change in one of the updates. I'll get it fixed here in a sec.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1. Add recursion depth protection in removing dependents because of
   a concern of recursive dependencies, which isn't actually possible
   with yang.  None-the-less, implemented.
2. Remove a duplicate move that gets generated as when using it with a
   Depth-first sorter its not necessary though could be on other sorters.
3. The old code depended on 3 levels of depth for the create only
   leafs.  Reworked the logic to not be dependent on depth.
4. __get_path_count() can no longer return a KeyError even though the
   caller paths would make that impossible, but future use cases
   may need it to not throw an exception when the path is not found.

Signed-off-by: Brad House <bhouse@nexthop.ai>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lguohan lguohan merged commit 1580ccc into sonic-net:master Mar 7, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants