Skip to content

Generic Configuration Updater (GCU) performance enhancements#3831

Merged
lguohan merged 17 commits intosonic-net:masterfrom
bhouse-nexthop:bhouse-nexthop/gcu-perf
Nov 3, 2025
Merged

Generic Configuration Updater (GCU) performance enhancements#3831
lguohan merged 17 commits intosonic-net:masterfrom
bhouse-nexthop:bhouse-nexthop/gcu-perf

Conversation

@bhouse-nexthop
Copy link
Contributor

@bhouse-nexthop bhouse-nexthop commented Apr 7, 2025

Important: Please review each commit (including commit message) individually. Looking at the patch-set as a whole may cause confusion.

What I did

Generic Configuration Updater is extremely slow, using the python profiler it was possible to determine the worst offenders where changes could be made without affecting the overall algorithm and HLD design documentation.

Brief overview of changes:

  • Prevent copy.deepcopy() calls where possible
  • Don't run validation twice back to back
  • Move configdb path <> xpath conversion logic to sonic-yang-mgmt where it belongs and enhance it to support schema conversion (not just data) and add caching.
  • Sort table keys by the number of schema backlinks and must statements for the node to try better guess the right order of the patches to generate rather than doing it in alphabetical order which is likely to cause validation failures.
  • Add ability to Group patches together in some commits where its known they will not cause issues, these are things like grouping parameter updates under the same key.
  • When applying changes, do not re-read the configuration from redis twice between each applied patch (this is extremely slow, and actually hid a race condition). We are mutating the configuration and a lock is held so we know the expected before and after. There is still a final validation to ensure something didn't go sideways.

Dependencies:

Stats below ... (stats need both this and the sonic-utilities PR to be relevant)...

Original Performance:
Dry Run:

time sudo config replace -d ./config_db.json
...
real	2m51.588s
user	2m23.777s
sys	0m25.300s

Full:

time sudo config replace ./config_db.json
...
real	14m53.772s
user	12m2.376s
sys	2m8.908s

With Patch:
Dry Run:

time sudo config replace -d ./config_db.json
...
real	0m59.602s
user	0m56.434s
sys	0m2.110s

Full:

time sudo config replace ./config_db.json
...
real	1m54.303s
user	0m58.482s
sys	0m2.545s

So that's roughly 3x improvement for dry-run, and 7.5x improvement for full commit. There is room for improvement on the full commit due to a sleep(1) being used between each patch because of a race condition found in the prior code (that was hidden due to a costly sanity check that has been removed).

How I did it

Profiling via cProfiler

How to verify it

Run sonic-utilities test cases, they still pass

Previous command output (if the output of a command-line utility has changed)

N/A

New command output (if the output of a command-line utility has changed)

N/A

Fixes sonic-net/sonic-buildimage#22372

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

qiluo-msft pushed a commit to sonic-net/sonic-buildimage that referenced this pull request May 13, 2025
…22254)

Why I did it
Generic Config Updater (GCU) is notoriously slow. These patches add some helpers for the GCU overhaul (mostly in sonic-utilities) in order to facilitate the optimizations. These changes are in sonic-yang-mgmt plus a patch to libyang 1.

Changes include:

Libyang v1 was not exposing must data for leaf nodes (like it does for other node types). Patch to correct this oversight.
Loading of sonic configuration data should not mutate the user-provided data, this forces callers to know they need to deepcopy the data, plus in most instances data won't be mutated.
sonic-yang-mgmt test models should more closely mimic real sonic yang models as we can't implement other features that assume this otherwise (sonic mandates that the top-level container has the same name as the module).
Import sonic configdb<>yang xpath conversion from sonic-utilities as this should be shared code. 90% of this code is copied from the original source but does contain some bugfixes and enhancements including caching.
Make find_schema_dependencies() public, plus add the ability to find dependencies recursively. This implementation is caching.
Add new find_schema_must_count(), with recursive capabilities to find if a node (and its children) have must clauses. This implementation is caching.
sonic-utilities PR: sonic-net/sonic-utilities#3831

Stats below ... (stats need both this and the sonic-utilities PR to be relevant)...

Original Performance:
Dry Run:

time sudo config replace -d ./config_db.json
...
real	2m51.588s
user	2m23.777s
sys	0m25.300s
Full:

time sudo config replace ./config_db.json
...
real	14m53.772s
user	12m2.376s
sys	2m8.908s
With Patch:
Dry Run:

time sudo config replace -d ./config_db.json
...
real	0m59.602s
user	0m56.434s
sys	0m2.110s
Full:

time sudo config replace ./config_db.json
...
real	1m54.303s
user	0m58.482s
sys	0m2.545s
So that's roughly 3x improvement for dry-run, and 7.5x improvement for full commit. There is room for improvement on the full commit due to a sleep(1) being used between each patch because of a race condition found in the prior code (that was hidden due to a costly sanity check that has been removed).

Work item tracking
How I did it
Gathered profiling data using cProfile and evaluated where the largest gains could be had.

How to verify it
This patch is standalone as it will not cause any issues in other projects which use sonic-yang-mgmt or libyang, however the performance benefits are in sonic-utilities. Apply both this commit and the sonic-utilities PR to a local branch, build and run sonic-utilities tests. Then create a full image, load it onto a DUT (with default configuration), and use the attached
config_db.json to attempt a config replace operation (tested on Dell S5248F).

Which release branch to backport (provide reason below if selected)
 202411
Tested branch (Please provide the tested image version)
master as of 20250521

Description for the changelog
sonic-yang-mgmt: Generic Config Updater - performance dependencies

Fixes #22372
@bhouse-nexthop bhouse-nexthop force-pushed the bhouse-nexthop/gcu-perf branch from 046e1a6 to 9aa198e Compare May 13, 2025 18:12
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bhouse-nexthop
Copy link
Contributor Author

I just rebased against master to force a new build.

@bhouse-nexthop bhouse-nexthop force-pushed the bhouse-nexthop/gcu-perf branch from 9aa198e to 594e80d Compare May 14, 2025 12:52
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bhouse-nexthop bhouse-nexthop force-pushed the bhouse-nexthop/gcu-perf branch from 594e80d to 91dd8dd Compare May 15, 2025 12:38
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bhouse-nexthop bhouse-nexthop force-pushed the bhouse-nexthop/gcu-perf branch from 8d48fbd to 2e767f7 Compare May 15, 2025 14:25
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bhouse-nexthop bhouse-nexthop force-pushed the bhouse-nexthop/gcu-perf branch from 2e767f7 to ddbb618 Compare May 15, 2025 14:34
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@bhouse-nexthop
Copy link
Contributor Author

@qiluo-msft we really need to get this merged. This is a huge performance fix that makes GCU actually usable.

@bhouse-nexthop
Copy link
Contributor Author

Hi @saiarcot895, nice meeting you at the Sonic workshop today. I mentioned this PR and you said you might be able to take a look at it. Thanks!
-Brad

@bhouse-nexthop
Copy link
Contributor Author

@hdwhdw can you help as well?

@hdwhdw hdwhdw self-requested a review October 17, 2025 22:43
@lguohan lguohan requested a review from Copilot October 18, 2025 22:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements significant performance enhancements for the Generic Configuration Updater (GCU), achieving 3x improvement for dry-run operations and 7.5x improvement for full commits. The changes focus on reducing redundant operations, optimizing data access patterns, and implementing bulk operations where safe.

Key changes include:

  • Introduction of JsonMoveGroup for grouping related patches to reduce validation overhead
  • Migration of path/xpath conversion logic to sonic-yang-mgmt with caching capabilities
  • Implementation of bulk move generators to group similar operations
  • Elimination of redundant deep copies and config reloads during patch application

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
generic_config_updater/patch_sorter.py Core performance improvements with JsonMoveGroup, bulk generators, and optimized validation
generic_config_updater/gu_common.py Path addressing optimizations and schema-based key sorting
generic_config_updater/change_applier.py Streamlined config application with reduced Redis access
generic_config_updater/generic_updater.py Updated to use new change applier interface
tests/ Updated test files to accommodate new JsonMoveGroup structure and API changes

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@bhouse-nexthop
Copy link
Contributor Author

Hi @lguohan thanks for triggering copilot on this. I left one open that I need to look at when I'm not jet lagged. I'll do that tomorrow am. Thanks!

mssonicbld added a commit to Azure/sonic-buildimage-msft that referenced this pull request Oct 19, 2025
…ance dependencies (#1711)

#### Why I did it

***Important***: Please review each commit (including commit message) individually. Looking at the patch-set as a whole may cause confusion.

Generic Config Updater (GCU) is notoriously slow. These patches add some helpers for the GCU overhaul (mostly in sonic-utilities) in order to facilitate the optimizations. These changes are in sonic-yang-mgmt plus a patch to libyang 1.

Changes include:
 failure_prs.log skip_prs.log Libyang v1 was not exposing `must` data for leaf nodes (like it does for other node types). Patch to correct this oversight.
 failure_prs.log skip_prs.log Loading of sonic configuration data should not mutate the user-provided data, this forces callers to know they need to deepcopy the data, plus in most instances data won't be mutated.
 failure_prs.log skip_prs.log sonic-yang-mgmt test models should more closely mimic real sonic yang models as we can't implement other features that assume this otherwise (sonic mandates that the top-level container has the same name as the module).
 failure_prs.log skip_prs.log Import `sonic configdb`<>`yang xpath` conversion from sonic-utilities as this should be shared code. 90% of this code is copied from the original source but does contain some bugfixes and enhancements including caching.
 failure_prs.log skip_prs.log Make `find_schema_dependencies()` public, plus add the ability to find dependencies recursively. This implementation is caching.
 failure_prs.log skip_prs.log Add new `find_schema_must_count()`, with recursive capabilities to find if a node (and its children) have must clauses. This implementation is caching.

sonic-utilities PR: sonic-net/sonic-utilities#3831

Stats below ... (stats need both this and the sonic-utilities PR to be relevant)...

<ins>**Original Performance:**</ins>
Dry Run:
```
time sudo config replace -d ./config_db.json
...
real	2m51.588s
user	2m23.777s
sys	0m25.300s
```

Full:
```
time sudo config replace ./config_db.json
...
real	14m53.772s
user	12m2.376s
sys	2m8.908s
```

<ins>**With Patch**:</ins>
Dry Run:
```
time sudo config replace -d ./config_db.json
...
real	0m59.602s
user	0m56.434s
sys	0m2.110s
```

Full:
```
time sudo config replace ./config_db.json
...
real	1m54.303s
user	0m58.482s
sys	0m2.545s
```

So that's roughly 3x improvement for dry-run, and 7.5x improvement for full commit. There is room for improvement on the full commit due to a `sleep(1)` being used between each patch because of a race condition found in the prior code (that was hidden due to a costly sanity check that has been removed).

##### Work item tracking

#### How I did it

Gathered profiling data using cProfile and evaluated where the largest gains could be had.

#### How to verify it

This patch is standalone as it will not cause any issues in other projects which use sonic-yang-mgmt or libyang, however the performance benefits are in sonic-utilities. Apply both this commit and the sonic-utilities PR to a local branch, build and run sonic-utilities tests. Then create a full image, load it onto a DUT (with default configuration), and use the attached
[config_db.json](https://github.com/user-attachments/files/19635712/config_db.json) to attempt a `config replace` operation (tested on Dell S5248F).

#### Which release branch to backport (provide reason below if selected)

- [x] 202411

#### Tested branch (Please provide the tested image version)

master as of 20250521

#### Description for the changelog

sonic-yang-mgmt: Generic Config Updater - performance dependencies

#### Link to config_db schema for YANG module changes
N/A

#### A picture of a cute animal (not mandatory but encouraged)

Fixes sonic-net/sonic-buildimage#22372
Signed-off-by: Brad House <bhouse@nexthop.ai>
@bhouse-nexthop
Copy link
Contributor Author

@lguohan can you review now that I've addressed?

@lguohan
Copy link
Contributor

lguohan commented Oct 24, 2025

will do

@bradh352
Copy link
Contributor

bradh352 commented Nov 2, 2025

@lguohan had a chance to review this yet?

@lguohan lguohan merged commit bd3de9d into sonic-net:master Nov 3, 2025
6 checks passed

ret = self._services_validate(run_data, upd_data, upd_keys)
if not ret:
run_data = get_config_db_as_json(self.scope)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removing this block of code? It seems useful for a bugfix.

Copy link
Contributor Author

@bhouse-nexthop bhouse-nexthop Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the commit message for this change:

ChangeApplier: no need to re-read configdb twice per patch
During change application, the configdb was being read before applying
each patch, then again as a validation step.  Since we are holding
a lock on the configdb and we know the state as we are mutating it,
there is no reason we need to do this as it greatly slows down the
overall application of the patches to Redis.

There is also no reason at all to do the validation step, this appears
to be something left in during development to help find bugs but
not something that will ever catch any sort of issue post-development.
There is still a final validation step that will catch overall errors
that is minimal enough to not cause any performance concerns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation of #2295 is explained in PR description.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #2295 is about remove_backend_tables_from_config not about re-reading configdb again. When the configdb is infact read again, the remove_backend_tables_from_config is still called.

@mssonicbld
Copy link
Collaborator

Cherry-pick PR to msft-202412: Azure/sonic-utilities.msft#254

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhancement: Improve Generic Config Updater (GCU) performance

9 participants