Skip to content

sonic-yang-mgmt: Generic Config Updater - performance dependencies#22254

Merged
qiluo-msft merged 6 commits intosonic-net:masterfrom
bhouse-nexthop:bhouse-nexthop/gcu-perf
May 13, 2025
Merged

sonic-yang-mgmt: Generic Config Updater - performance dependencies#22254
qiluo-msft merged 6 commits intosonic-net:masterfrom
bhouse-nexthop:bhouse-nexthop/gcu-perf

Conversation

@bhouse-nexthop
Copy link
Collaborator

@bhouse-nexthop bhouse-nexthop commented Apr 7, 2025

Why I did it

Important: Please review each commit (including commit message) individually. Looking at the patch-set as a whole may cause confusion.

Generic Config Updater (GCU) is notoriously slow. These patches add some helpers for the GCU overhaul (mostly in sonic-utilities) in order to facilitate the optimizations. These changes are in sonic-yang-mgmt plus a patch to libyang 1.

Changes include:

  • Libyang v1 was not exposing must data for leaf nodes (like it does for other node types). Patch to correct this oversight.
  • Loading of sonic configuration data should not mutate the user-provided data, this forces callers to know they need to deepcopy the data, plus in most instances data won't be mutated.
  • sonic-yang-mgmt test models should more closely mimic real sonic yang models as we can't implement other features that assume this otherwise (sonic mandates that the top-level container has the same name as the module).
  • Import sonic configdb<>yang xpath conversion from sonic-utilities as this should be shared code. 90% of this code is copied from the original source but does contain some bugfixes and enhancements including caching.
  • Make find_schema_dependencies() public, plus add the ability to find dependencies recursively. This implementation is caching.
  • Add new find_schema_must_count(), with recursive capabilities to find if a node (and its children) have must clauses. This implementation is caching.

sonic-utilities PR: sonic-net/sonic-utilities#3831

Stats below ... (stats need both this and the sonic-utilities PR to be relevant)...

Original Performance:
Dry Run:

time sudo config replace -d ./config_db.json
...
real	2m51.588s
user	2m23.777s
sys	0m25.300s

Full:

time sudo config replace ./config_db.json
...
real	14m53.772s
user	12m2.376s
sys	2m8.908s

With Patch:
Dry Run:

time sudo config replace -d ./config_db.json
...
real	0m59.602s
user	0m56.434s
sys	0m2.110s

Full:

time sudo config replace ./config_db.json
...
real	1m54.303s
user	0m58.482s
sys	0m2.545s

So that's roughly 3x improvement for dry-run, and 7.5x improvement for full commit. There is room for improvement on the full commit due to a sleep(1) being used between each patch because of a race condition found in the prior code (that was hidden due to a costly sanity check that has been removed).

Work item tracking

How I did it

Gathered profiling data using cProfile and evaluated where the largest gains could be had.

How to verify it

This patch is standalone as it will not cause any issues in other projects which use sonic-yang-mgmt or libyang, however the performance benefits are in sonic-utilities. Apply both this commit and the sonic-utilities PR to a local branch, build and run sonic-utilities tests. Then create a full image, load it onto a DUT (with default configuration), and use the attached
config_db.json to attempt a config replace operation (tested on Dell S5248F).

Which release branch to backport (provide reason below if selected)

  • 202411

Tested branch (Please provide the tested image version)

master as of 20250521

Description for the changelog

sonic-yang-mgmt: Generic Config Updater - performance dependencies

Link to config_db schema for YANG module changes

N/A

A picture of a cute animal (not mandatory but encouraged)

Fixes #22372
Signed-off-by: Brad House [email protected]

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bhouse-nexthop bhouse-nexthop changed the title Generic Configuration Updater (GCU) performance enhancements sonic-yang-mgmt: Generic Config Updater - performance dependencies Apr 7, 2025
@bhouse-nexthop bhouse-nexthop force-pushed the bhouse-nexthop/gcu-perf branch from ecbd7e7 to ff0b317 Compare April 7, 2025 20:17
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bhouse-nexthop bhouse-nexthop force-pushed the bhouse-nexthop/gcu-perf branch from ff0b317 to 2a4955e Compare April 8, 2025 13:18
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bhouse-nexthop bhouse-nexthop force-pushed the bhouse-nexthop/gcu-perf branch from 2a4955e to fc611d9 Compare April 13, 2025 18:22
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

bradh352 and others added 5 commits April 17, 2025 07:15
The user-provided configdb should not be mutated as the user
may expect it to be unmodified.  It is fairly unlikely to need
to modify the input data, and it doesn't make sense to force
the caller to deep copy the data before passing it in just in
case it gets modified.
sonic requires top-level container name be the same as the module name,
since things depend on this within sonic it doesn't make sense to
have tests that don't honor this restriction.

NOTE: future patches will expect this consistency in tests.
It appears there was an oversight in upstream libyang v1 that must
clauses weren't exposed via SWIG on leaf nodes, but were on all
other relevant node types.  This patch corrects this oversight.
Import modified xpath to configdb path and configdb path to xpath
helpers from sonic-utilites/generic_config_updater as they don't
belong there.  Modify the code to be more readable, add the ability
to extract schema paths instead of data xpaths. Don't require
a configdb to be passed in unless we need to know a list index.

This also caches configdb->xpath conversion which is used in
recursive searches as it may be an expensive operation.
Using schema metadata for generic config updater can be used to determine
change ordering so these need to be implemented and exposed as public.
Due the the repeated nature of these calls, they implement caching
for performance.
@bhouse-nexthop bhouse-nexthop force-pushed the bhouse-nexthop/gcu-perf branch from fc611d9 to 09eff1e Compare April 17, 2025 11:16
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bhouse-nexthop bhouse-nexthop force-pushed the bhouse-nexthop/gcu-perf branch from 9de5dc1 to a095f03 Compare April 17, 2025 14:31
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

Cherry-pick PR to msft-202412: Azure/sonic-buildimage-msft#1711

@mssonicbld
Copy link
Collaborator

@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!!
Azure/sonic-buildimage-msft#1711

@bhouse-nexthop
Copy link
Collaborator Author

@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!Azure/sonic-buildimage-msft#1711

doesn't look like the build spawned off properly to me ... are you seeing something else ?

@mssonicbld
Copy link
Collaborator

@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!!
Azure/sonic-buildimage-msft#1711

10 similar comments
@mssonicbld
Copy link
Collaborator

@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!!
Azure/sonic-buildimage-msft#1711

@mssonicbld
Copy link
Collaborator

@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!!
Azure/sonic-buildimage-msft#1711

@mssonicbld
Copy link
Collaborator

@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!!
Azure/sonic-buildimage-msft#1711

@mssonicbld
Copy link
Collaborator

@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!!
Azure/sonic-buildimage-msft#1711

@mssonicbld
Copy link
Collaborator

@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!!
Azure/sonic-buildimage-msft#1711

@mssonicbld
Copy link
Collaborator

@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!!
Azure/sonic-buildimage-msft#1711

@mssonicbld
Copy link
Collaborator

@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!!
Azure/sonic-buildimage-msft#1711

@mssonicbld
Copy link
Collaborator

@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!!
Azure/sonic-buildimage-msft#1711

@mssonicbld
Copy link
Collaborator

@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!!
Azure/sonic-buildimage-msft#1711

@mssonicbld
Copy link
Collaborator

@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!!
Azure/sonic-buildimage-msft#1711

lguohan added a commit to sonic-net/sonic-utilities that referenced this pull request Nov 3, 2025
Generic Configuration Updater (GCU) performance enhancements

#### What I did

Generic Configuration Updater is extremely slow, using the python profiler it was possible to determine the worst offenders where changes could be made without affecting the overall algorithm and HLD design documentation.

Brief overview of changes:
* Prevent copy.deepcopy() calls where possible
* Don't run validation twice back to back
* Move configdb path <> xpath conversion logic to sonic-yang-mgmt where it belongs and enhance it to support schema conversion (not just data) and add caching.
* Sort table keys by the number of schema backlinks and must statements for the node to try better guess the right order of the patches to generate rather than doing it in alphabetical order which is likely to cause validation failures.
* Add ability to Group patches together in some commits where its known they will not cause issues, these are things like grouping parameter updates under the same key.
* When applying changes, do not re-read the configuration from redis twice between each applied patch (this is **extremely** slow, and actually hid a race condition).  We are mutating the configuration and a lock is held so we know the expected before and after.  There is still a final validation to ensure something didn't go sideways.

Dependencies:
 * sonic-yang-mgmt enhancements: sonic-net/sonic-buildimage#22254
 * sonic-yang-mgmt parse uses/grouping: sonic-net/sonic-buildimage#21907
 * sonic-utilities rely on sonic-yang-mgmt uses/grouping handling: #3814

Stats below ... (stats need both this and the sonic-utilities PR to be relevant)...

<ins>**Original Performance:**</ins>
Dry Run:
```
time sudo config replace -d ./config_db.json
...
real	2m51.588s
user	2m23.777s
sys	0m25.300s
```

Full:
```
time sudo config replace ./config_db.json
...
real	14m53.772s
user	12m2.376s
sys	2m8.908s
```

<ins>**With Patch**:</ins>
Dry Run:
```
time sudo config replace -d ./config_db.json
...
real	0m59.602s
user	0m56.434s
sys	0m2.110s
```

Full:
```
time sudo config replace ./config_db.json
...
real	1m54.303s
user	0m58.482s
sys	0m2.545s
```

So that's roughly 3x improvement for dry-run, and 7.5x improvement for full commit.  There is room for improvement on the full commit due to a `sleep(1)` being used between each patch because of a race condition found in the prior code (that was hidden due to a costly sanity check that has been removed).
mssonicbld added a commit to mssonicbld/sonic-utilities.msft that referenced this pull request Nov 9, 2025
***Important***: Please review each commit (including commit message) individually.  Looking at the patch-set as a whole may cause confusion.

#### What I did

Generic Configuration Updater is extremely slow, using the python profiler it was possible to determine the worst offenders where changes could be made without affecting the overall algorithm and HLD design documentation.

Brief overview of changes:
* Prevent copy.deepcopy() calls where possible
* Don't run validation twice back to back
* Move configdb path <> xpath conversion logic to sonic-yang-mgmt where it belongs and enhance it to support schema conversion (not just data) and add caching.
* Sort table keys by the number of schema backlinks and must statements for the node to try better guess the right order of the patches to generate rather than doing it in alphabetical order which is likely to cause validation failures.
* Add ability to Group patches together in some commits where its known they will not cause issues, these are things like grouping parameter updates under the same key.
* When applying changes, do not re-read the configuration from redis twice between each applied patch (this is **extremely** slow, and actually hid a race condition).  We are mutating the configuration and a lock is held so we know the expected before and after.  There is still a final validation to ensure something didn't go sideways.

Dependencies:
 * sonic-yang-mgmt enhancements: sonic-net/sonic-buildimage#22254
 * sonic-yang-mgmt parse uses/grouping: sonic-net/sonic-buildimage#21907
 * sonic-utilities rely on sonic-yang-mgmt uses/grouping handling: sonic-net/sonic-utilities#3814

Stats below ... (stats need both this and the sonic-utilities PR to be relevant)...

<ins>**Original Performance:**</ins>
Dry Run:
```
time sudo config replace -d ./config_db.json
...
real	2m51.588s
user	2m23.777s
sys	0m25.300s
```

Full:
```
time sudo config replace ./config_db.json
...
real	14m53.772s
user	12m2.376s
sys	2m8.908s
```

<ins>**With Patch**:</ins>
Dry Run:
```
time sudo config replace -d ./config_db.json
...
real	0m59.602s
user	0m56.434s
sys	0m2.110s
```

Full:
```
time sudo config replace ./config_db.json
...
real	1m54.303s
user	0m58.482s
sys	0m2.545s
```

So that's roughly 3x improvement for dry-run, and 7.5x improvement for full commit.  There is room for improvement on the full commit due to a `sleep(1)` being used between each patch because of a race condition found in the prior code (that was hidden due to a costly sanity check that has been removed).

#### How I did it

Profiling via cProfiler

#### How to verify it

Run sonic-utilities test cases, they still pass

#### Previous command output (if the output of a command-line utility has changed)

N/A

#### New command output (if the output of a command-line utility has changed)

N/A

Fixes sonic-net/sonic-buildimage#22372
mssonicbld added a commit to Azure/sonic-utilities.msft that referenced this pull request Nov 9, 2025
***Important***: Please review each commit (including commit message) individually. Looking at the patch-set as a whole may cause confusion.

#### What I did

Generic Configuration Updater is extremely slow, using the python profiler it was possible to determine the worst offenders where changes could be made without affecting the overall algorithm and HLD design documentation.

Brief overview of changes:
* Prevent copy.deepcopy() calls where possible
* Don't run validation twice back to back
* Move configdb path <> xpath conversion logic to sonic-yang-mgmt where it belongs and enhance it to support schema conversion (not just data) and add caching.
* Sort table keys by the number of schema backlinks and must statements for the node to try better guess the right order of the patches to generate rather than doing it in alphabetical order which is likely to cause validation failures.
* Add ability to Group patches together in some commits where its known they will not cause issues, these are things like grouping parameter updates under the same key.
* When applying changes, do not re-read the configuration from redis twice between each applied patch (this is **extremely** slow, and actually hid a race condition). We are mutating the configuration and a lock is held so we know the expected before and after. There is still a final validation to ensure something didn't go sideways.

Dependencies:
 failure_prs.log sonic-yang-mgmt enhancements: sonic-net/sonic-buildimage#22254
 failure_prs.log sonic-yang-mgmt parse uses/grouping: sonic-net/sonic-buildimage#21907
 failure_prs.log sonic-utilities rely on sonic-yang-mgmt uses/grouping handling: sonic-net/sonic-utilities#3814

Stats below ... (stats need both this and the sonic-utilities PR to be relevant)...

<ins>**Original Performance:**</ins>
Dry Run:
```
time sudo config replace -d ./config_db.json
...
real	2m51.588s
user	2m23.777s
sys	0m25.300s
```

Full:
```
time sudo config replace ./config_db.json
...
real	14m53.772s
user	12m2.376s
sys	2m8.908s
```

<ins>**With Patch**:</ins>
Dry Run:
```
time sudo config replace -d ./config_db.json
...
real	0m59.602s
user	0m56.434s
sys	0m2.110s
```

Full:
```
time sudo config replace ./config_db.json
...
real	1m54.303s
user	0m58.482s
sys	0m2.545s
```

So that's roughly 3x improvement for dry-run, and 7.5x improvement for full commit. There is room for improvement on the full commit due to a `sleep(1)` being used between each patch because of a race condition found in the prior code (that was hidden due to a costly sanity check that has been removed).

#### How I did it

Profiling via cProfiler

#### How to verify it

Run sonic-utilities test cases, they still pass

#### Previous command output (if the output of a command-line utility has changed)

N/A

#### New command output (if the output of a command-line utility has changed)

N/A

Fixes sonic-net/sonic-buildimage#22372
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhancement: Improve Generic Config Updater (GCU) performance

7 participants