sonic-yang-mgmt: Generic Config Updater - performance dependencies#22254
sonic-yang-mgmt: Generic Config Updater - performance dependencies#22254qiluo-msft merged 6 commits intosonic-net:masterfrom
Conversation
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
ecbd7e7 to
ff0b317
Compare
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
ff0b317 to
2a4955e
Compare
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
2a4955e to
fc611d9
Compare
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
The user-provided configdb should not be mutated as the user may expect it to be unmodified. It is fairly unlikely to need to modify the input data, and it doesn't make sense to force the caller to deep copy the data before passing it in just in case it gets modified.
sonic requires top-level container name be the same as the module name, since things depend on this within sonic it doesn't make sense to have tests that don't honor this restriction. NOTE: future patches will expect this consistency in tests.
It appears there was an oversight in upstream libyang v1 that must clauses weren't exposed via SWIG on leaf nodes, but were on all other relevant node types. This patch corrects this oversight.
Import modified xpath to configdb path and configdb path to xpath helpers from sonic-utilites/generic_config_updater as they don't belong there. Modify the code to be more readable, add the ability to extract schema paths instead of data xpaths. Don't require a configdb to be passed in unless we need to know a list index. This also caches configdb->xpath conversion which is used in recursive searches as it may be an expensive operation.
Using schema metadata for generic config updater can be used to determine change ordering so these need to be implemented and exposed as public. Due the the repeated nature of these calls, they implement caching for performance.
fc611d9 to
09eff1e
Compare
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
9de5dc1 to
a095f03
Compare
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Cherry-pick PR to msft-202412: Azure/sonic-buildimage-msft#1711 |
|
@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!! |
doesn't look like the build spawned off properly to me ... are you seeing something else ? |
|
@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!! |
10 similar comments
|
@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!! |
|
@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!! |
|
@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!! |
|
@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!! |
|
@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!! |
|
@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!! |
|
@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!! |
|
@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!! |
|
@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!! |
|
@bhouse-nexthop cherry pick PR didn't pass PR checker. Please check!!! |
Generic Configuration Updater (GCU) performance enhancements #### What I did Generic Configuration Updater is extremely slow, using the python profiler it was possible to determine the worst offenders where changes could be made without affecting the overall algorithm and HLD design documentation. Brief overview of changes: * Prevent copy.deepcopy() calls where possible * Don't run validation twice back to back * Move configdb path <> xpath conversion logic to sonic-yang-mgmt where it belongs and enhance it to support schema conversion (not just data) and add caching. * Sort table keys by the number of schema backlinks and must statements for the node to try better guess the right order of the patches to generate rather than doing it in alphabetical order which is likely to cause validation failures. * Add ability to Group patches together in some commits where its known they will not cause issues, these are things like grouping parameter updates under the same key. * When applying changes, do not re-read the configuration from redis twice between each applied patch (this is **extremely** slow, and actually hid a race condition). We are mutating the configuration and a lock is held so we know the expected before and after. There is still a final validation to ensure something didn't go sideways. Dependencies: * sonic-yang-mgmt enhancements: sonic-net/sonic-buildimage#22254 * sonic-yang-mgmt parse uses/grouping: sonic-net/sonic-buildimage#21907 * sonic-utilities rely on sonic-yang-mgmt uses/grouping handling: #3814 Stats below ... (stats need both this and the sonic-utilities PR to be relevant)... <ins>**Original Performance:**</ins> Dry Run: ``` time sudo config replace -d ./config_db.json ... real 2m51.588s user 2m23.777s sys 0m25.300s ``` Full: ``` time sudo config replace ./config_db.json ... real 14m53.772s user 12m2.376s sys 2m8.908s ``` <ins>**With Patch**:</ins> Dry Run: ``` time sudo config replace -d ./config_db.json ... real 0m59.602s user 0m56.434s sys 0m2.110s ``` Full: ``` time sudo config replace ./config_db.json ... real 1m54.303s user 0m58.482s sys 0m2.545s ``` So that's roughly 3x improvement for dry-run, and 7.5x improvement for full commit. There is room for improvement on the full commit due to a `sleep(1)` being used between each patch because of a race condition found in the prior code (that was hidden due to a costly sanity check that has been removed).
***Important***: Please review each commit (including commit message) individually. Looking at the patch-set as a whole may cause confusion. #### What I did Generic Configuration Updater is extremely slow, using the python profiler it was possible to determine the worst offenders where changes could be made without affecting the overall algorithm and HLD design documentation. Brief overview of changes: * Prevent copy.deepcopy() calls where possible * Don't run validation twice back to back * Move configdb path <> xpath conversion logic to sonic-yang-mgmt where it belongs and enhance it to support schema conversion (not just data) and add caching. * Sort table keys by the number of schema backlinks and must statements for the node to try better guess the right order of the patches to generate rather than doing it in alphabetical order which is likely to cause validation failures. * Add ability to Group patches together in some commits where its known they will not cause issues, these are things like grouping parameter updates under the same key. * When applying changes, do not re-read the configuration from redis twice between each applied patch (this is **extremely** slow, and actually hid a race condition). We are mutating the configuration and a lock is held so we know the expected before and after. There is still a final validation to ensure something didn't go sideways. Dependencies: * sonic-yang-mgmt enhancements: sonic-net/sonic-buildimage#22254 * sonic-yang-mgmt parse uses/grouping: sonic-net/sonic-buildimage#21907 * sonic-utilities rely on sonic-yang-mgmt uses/grouping handling: sonic-net/sonic-utilities#3814 Stats below ... (stats need both this and the sonic-utilities PR to be relevant)... <ins>**Original Performance:**</ins> Dry Run: ``` time sudo config replace -d ./config_db.json ... real 2m51.588s user 2m23.777s sys 0m25.300s ``` Full: ``` time sudo config replace ./config_db.json ... real 14m53.772s user 12m2.376s sys 2m8.908s ``` <ins>**With Patch**:</ins> Dry Run: ``` time sudo config replace -d ./config_db.json ... real 0m59.602s user 0m56.434s sys 0m2.110s ``` Full: ``` time sudo config replace ./config_db.json ... real 1m54.303s user 0m58.482s sys 0m2.545s ``` So that's roughly 3x improvement for dry-run, and 7.5x improvement for full commit. There is room for improvement on the full commit due to a `sleep(1)` being used between each patch because of a race condition found in the prior code (that was hidden due to a costly sanity check that has been removed). #### How I did it Profiling via cProfiler #### How to verify it Run sonic-utilities test cases, they still pass #### Previous command output (if the output of a command-line utility has changed) N/A #### New command output (if the output of a command-line utility has changed) N/A Fixes sonic-net/sonic-buildimage#22372
***Important***: Please review each commit (including commit message) individually. Looking at the patch-set as a whole may cause confusion. #### What I did Generic Configuration Updater is extremely slow, using the python profiler it was possible to determine the worst offenders where changes could be made without affecting the overall algorithm and HLD design documentation. Brief overview of changes: * Prevent copy.deepcopy() calls where possible * Don't run validation twice back to back * Move configdb path <> xpath conversion logic to sonic-yang-mgmt where it belongs and enhance it to support schema conversion (not just data) and add caching. * Sort table keys by the number of schema backlinks and must statements for the node to try better guess the right order of the patches to generate rather than doing it in alphabetical order which is likely to cause validation failures. * Add ability to Group patches together in some commits where its known they will not cause issues, these are things like grouping parameter updates under the same key. * When applying changes, do not re-read the configuration from redis twice between each applied patch (this is **extremely** slow, and actually hid a race condition). We are mutating the configuration and a lock is held so we know the expected before and after. There is still a final validation to ensure something didn't go sideways. Dependencies: failure_prs.log sonic-yang-mgmt enhancements: sonic-net/sonic-buildimage#22254 failure_prs.log sonic-yang-mgmt parse uses/grouping: sonic-net/sonic-buildimage#21907 failure_prs.log sonic-utilities rely on sonic-yang-mgmt uses/grouping handling: sonic-net/sonic-utilities#3814 Stats below ... (stats need both this and the sonic-utilities PR to be relevant)... <ins>**Original Performance:**</ins> Dry Run: ``` time sudo config replace -d ./config_db.json ... real 2m51.588s user 2m23.777s sys 0m25.300s ``` Full: ``` time sudo config replace ./config_db.json ... real 14m53.772s user 12m2.376s sys 2m8.908s ``` <ins>**With Patch**:</ins> Dry Run: ``` time sudo config replace -d ./config_db.json ... real 0m59.602s user 0m56.434s sys 0m2.110s ``` Full: ``` time sudo config replace ./config_db.json ... real 1m54.303s user 0m58.482s sys 0m2.545s ``` So that's roughly 3x improvement for dry-run, and 7.5x improvement for full commit. There is room for improvement on the full commit due to a `sleep(1)` being used between each patch because of a race condition found in the prior code (that was hidden due to a costly sanity check that has been removed). #### How I did it Profiling via cProfiler #### How to verify it Run sonic-utilities test cases, they still pass #### Previous command output (if the output of a command-line utility has changed) N/A #### New command output (if the output of a command-line utility has changed) N/A Fixes sonic-net/sonic-buildimage#22372
Why I did it
Important: Please review each commit (including commit message) individually. Looking at the patch-set as a whole may cause confusion.
Generic Config Updater (GCU) is notoriously slow. These patches add some helpers for the GCU overhaul (mostly in sonic-utilities) in order to facilitate the optimizations. These changes are in sonic-yang-mgmt plus a patch to libyang 1.
Changes include:
mustdata for leaf nodes (like it does for other node types). Patch to correct this oversight.sonic configdb<>yang xpathconversion from sonic-utilities as this should be shared code. 90% of this code is copied from the original source but does contain some bugfixes and enhancements including caching.find_schema_dependencies()public, plus add the ability to find dependencies recursively. This implementation is caching.find_schema_must_count(), with recursive capabilities to find if a node (and its children) have must clauses. This implementation is caching.sonic-utilities PR: sonic-net/sonic-utilities#3831
Stats below ... (stats need both this and the sonic-utilities PR to be relevant)...
Original Performance:
Dry Run:
Full:
With Patch:
Dry Run:
Full:
So that's roughly 3x improvement for dry-run, and 7.5x improvement for full commit. There is room for improvement on the full commit due to a
sleep(1)being used between each patch because of a race condition found in the prior code (that was hidden due to a costly sanity check that has been removed).Work item tracking
How I did it
Gathered profiling data using cProfile and evaluated where the largest gains could be had.
How to verify it
This patch is standalone as it will not cause any issues in other projects which use sonic-yang-mgmt or libyang, however the performance benefits are in sonic-utilities. Apply both this commit and the sonic-utilities PR to a local branch, build and run sonic-utilities tests. Then create a full image, load it onto a DUT (with default configuration), and use the attached
config_db.json to attempt a
config replaceoperation (tested on Dell S5248F).Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
master as of 20250521
Description for the changelog
sonic-yang-mgmt: Generic Config Updater - performance dependencies
Link to config_db schema for YANG module changes
N/A
A picture of a cute animal (not mandatory but encouraged)
Fixes #22372
Signed-off-by: Brad House [email protected]