Generic Configuration Updater (GCU) performance enhancements#3831
Generic Configuration Updater (GCU) performance enhancements#3831lguohan merged 17 commits intosonic-net:masterfrom
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…22254) Why I did it Generic Config Updater (GCU) is notoriously slow. These patches add some helpers for the GCU overhaul (mostly in sonic-utilities) in order to facilitate the optimizations. These changes are in sonic-yang-mgmt plus a patch to libyang 1. Changes include: Libyang v1 was not exposing must data for leaf nodes (like it does for other node types). Patch to correct this oversight. Loading of sonic configuration data should not mutate the user-provided data, this forces callers to know they need to deepcopy the data, plus in most instances data won't be mutated. sonic-yang-mgmt test models should more closely mimic real sonic yang models as we can't implement other features that assume this otherwise (sonic mandates that the top-level container has the same name as the module). Import sonic configdb<>yang xpath conversion from sonic-utilities as this should be shared code. 90% of this code is copied from the original source but does contain some bugfixes and enhancements including caching. Make find_schema_dependencies() public, plus add the ability to find dependencies recursively. This implementation is caching. Add new find_schema_must_count(), with recursive capabilities to find if a node (and its children) have must clauses. This implementation is caching. sonic-utilities PR: sonic-net/sonic-utilities#3831 Stats below ... (stats need both this and the sonic-utilities PR to be relevant)... Original Performance: Dry Run: time sudo config replace -d ./config_db.json ... real 2m51.588s user 2m23.777s sys 0m25.300s Full: time sudo config replace ./config_db.json ... real 14m53.772s user 12m2.376s sys 2m8.908s With Patch: Dry Run: time sudo config replace -d ./config_db.json ... real 0m59.602s user 0m56.434s sys 0m2.110s Full: time sudo config replace ./config_db.json ... real 1m54.303s user 0m58.482s sys 0m2.545s So that's roughly 3x improvement for dry-run, and 7.5x improvement for full commit. There is room for improvement on the full commit due to a sleep(1) being used between each patch because of a race condition found in the prior code (that was hidden due to a costly sanity check that has been removed). Work item tracking How I did it Gathered profiling data using cProfile and evaluated where the largest gains could be had. How to verify it This patch is standalone as it will not cause any issues in other projects which use sonic-yang-mgmt or libyang, however the performance benefits are in sonic-utilities. Apply both this commit and the sonic-utilities PR to a local branch, build and run sonic-utilities tests. Then create a full image, load it onto a DUT (with default configuration), and use the attached config_db.json to attempt a config replace operation (tested on Dell S5248F). Which release branch to backport (provide reason below if selected) 202411 Tested branch (Please provide the tested image version) master as of 20250521 Description for the changelog sonic-yang-mgmt: Generic Config Updater - performance dependencies Fixes #22372
046e1a6 to
9aa198e
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
I just rebased against master to force a new build. |
9aa198e to
594e80d
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
594e80d to
91dd8dd
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
8d48fbd to
2e767f7
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
2e767f7 to
ddbb618
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
@qiluo-msft we really need to get this merged. This is a huge performance fix that makes GCU actually usable. |
|
Hi @saiarcot895, nice meeting you at the Sonic workshop today. I mentioned this PR and you said you might be able to take a look at it. Thanks! |
|
@hdwhdw can you help as well? |
There was a problem hiding this comment.
Pull Request Overview
This PR implements significant performance enhancements for the Generic Configuration Updater (GCU), achieving 3x improvement for dry-run operations and 7.5x improvement for full commits. The changes focus on reducing redundant operations, optimizing data access patterns, and implementing bulk operations where safe.
Key changes include:
- Introduction of
JsonMoveGroupfor grouping related patches to reduce validation overhead - Migration of path/xpath conversion logic to
sonic-yang-mgmtwith caching capabilities - Implementation of bulk move generators to group similar operations
- Elimination of redundant deep copies and config reloads during patch application
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| generic_config_updater/patch_sorter.py | Core performance improvements with JsonMoveGroup, bulk generators, and optimized validation |
| generic_config_updater/gu_common.py | Path addressing optimizations and schema-based key sorting |
| generic_config_updater/change_applier.py | Streamlined config application with reduced Redis access |
| generic_config_updater/generic_updater.py | Updated to use new change applier interface |
| tests/ | Updated test files to accommodate new JsonMoveGroup structure and API changes |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
Hi @lguohan thanks for triggering copilot on this. I left one open that I need to look at when I'm not jet lagged. I'll do that tomorrow am. Thanks! |
…ance dependencies (#1711) #### Why I did it ***Important***: Please review each commit (including commit message) individually. Looking at the patch-set as a whole may cause confusion. Generic Config Updater (GCU) is notoriously slow. These patches add some helpers for the GCU overhaul (mostly in sonic-utilities) in order to facilitate the optimizations. These changes are in sonic-yang-mgmt plus a patch to libyang 1. Changes include: failure_prs.log skip_prs.log Libyang v1 was not exposing `must` data for leaf nodes (like it does for other node types). Patch to correct this oversight. failure_prs.log skip_prs.log Loading of sonic configuration data should not mutate the user-provided data, this forces callers to know they need to deepcopy the data, plus in most instances data won't be mutated. failure_prs.log skip_prs.log sonic-yang-mgmt test models should more closely mimic real sonic yang models as we can't implement other features that assume this otherwise (sonic mandates that the top-level container has the same name as the module). failure_prs.log skip_prs.log Import `sonic configdb`<>`yang xpath` conversion from sonic-utilities as this should be shared code. 90% of this code is copied from the original source but does contain some bugfixes and enhancements including caching. failure_prs.log skip_prs.log Make `find_schema_dependencies()` public, plus add the ability to find dependencies recursively. This implementation is caching. failure_prs.log skip_prs.log Add new `find_schema_must_count()`, with recursive capabilities to find if a node (and its children) have must clauses. This implementation is caching. sonic-utilities PR: sonic-net/sonic-utilities#3831 Stats below ... (stats need both this and the sonic-utilities PR to be relevant)... <ins>**Original Performance:**</ins> Dry Run: ``` time sudo config replace -d ./config_db.json ... real 2m51.588s user 2m23.777s sys 0m25.300s ``` Full: ``` time sudo config replace ./config_db.json ... real 14m53.772s user 12m2.376s sys 2m8.908s ``` <ins>**With Patch**:</ins> Dry Run: ``` time sudo config replace -d ./config_db.json ... real 0m59.602s user 0m56.434s sys 0m2.110s ``` Full: ``` time sudo config replace ./config_db.json ... real 1m54.303s user 0m58.482s sys 0m2.545s ``` So that's roughly 3x improvement for dry-run, and 7.5x improvement for full commit. There is room for improvement on the full commit due to a `sleep(1)` being used between each patch because of a race condition found in the prior code (that was hidden due to a costly sanity check that has been removed). ##### Work item tracking #### How I did it Gathered profiling data using cProfile and evaluated where the largest gains could be had. #### How to verify it This patch is standalone as it will not cause any issues in other projects which use sonic-yang-mgmt or libyang, however the performance benefits are in sonic-utilities. Apply both this commit and the sonic-utilities PR to a local branch, build and run sonic-utilities tests. Then create a full image, load it onto a DUT (with default configuration), and use the attached [config_db.json](https://github.com/user-attachments/files/19635712/config_db.json) to attempt a `config replace` operation (tested on Dell S5248F). #### Which release branch to backport (provide reason below if selected) - [x] 202411 #### Tested branch (Please provide the tested image version) master as of 20250521 #### Description for the changelog sonic-yang-mgmt: Generic Config Updater - performance dependencies #### Link to config_db schema for YANG module changes N/A #### A picture of a cute animal (not mandatory but encouraged) Fixes sonic-net/sonic-buildimage#22372 Signed-off-by: Brad House <bhouse@nexthop.ai>
|
@lguohan can you review now that I've addressed? |
|
will do |
|
@lguohan had a chance to review this yet? |
|
|
||
| ret = self._services_validate(run_data, upd_data, upd_keys) | ||
| if not ret: | ||
| run_data = get_config_db_as_json(self.scope) |
There was a problem hiding this comment.
Why removing this block of code? It seems useful for a bugfix.
There was a problem hiding this comment.
From the commit message for this change:
ChangeApplier: no need to re-read configdb twice per patch
During change application, the configdb was being read before applying
each patch, then again as a validation step. Since we are holding
a lock on the configdb and we know the state as we are mutating it,
there is no reason we need to do this as it greatly slows down the
overall application of the patches to Redis.
There is also no reason at all to do the validation step, this appears
to be something left in during development to help find bugs but
not something that will ever catch any sort of issue post-development.
There is still a final validation step that will catch overall errors
that is minimal enough to not cause any performance concerns.
There was a problem hiding this comment.
The motivation of #2295 is explained in PR description.
There was a problem hiding this comment.
PR #2295 is about remove_backend_tables_from_config not about re-reading configdb again. When the configdb is infact read again, the remove_backend_tables_from_config is still called.
|
Cherry-pick PR to msft-202412: Azure/sonic-utilities.msft#254 |
Important: Please review each commit (including commit message) individually. Looking at the patch-set as a whole may cause confusion.
What I did
Generic Configuration Updater is extremely slow, using the python profiler it was possible to determine the worst offenders where changes could be made without affecting the overall algorithm and HLD design documentation.
Brief overview of changes:
Dependencies:
Stats below ... (stats need both this and the sonic-utilities PR to be relevant)...
Original Performance:
Dry Run:
Full:
With Patch:
Dry Run:
Full:
So that's roughly 3x improvement for dry-run, and 7.5x improvement for full commit. There is room for improvement on the full commit due to a
sleep(1)being used between each patch because of a race condition found in the prior code (that was hidden due to a costly sanity check that has been removed).How I did it
Profiling via cProfiler
How to verify it
Run sonic-utilities test cases, they still pass
Previous command output (if the output of a command-line utility has changed)
N/A
New command output (if the output of a command-line utility has changed)
N/A
Fixes sonic-net/sonic-buildimage#22372