Skip to content

[action] [PR:3831] Generic Configuration Updater (GCU) performance enhancements#254

Merged
mssonicbld merged 1 commit intoAzure:202412from
mssonicbld:cherry/msft-202412/3831
Nov 9, 2025
Merged

[action] [PR:3831] Generic Configuration Updater (GCU) performance enhancements#254
mssonicbld merged 1 commit intoAzure:202412from
mssonicbld:cherry/msft-202412/3831

Conversation

@mssonicbld
Copy link
Collaborator

Important: Please review each commit (including commit message) individually. Looking at the patch-set as a whole may cause confusion.

What I did

Generic Configuration Updater is extremely slow, using the python profiler it was possible to determine the worst offenders where changes could be made without affecting the overall algorithm and HLD design documentation.

Brief overview of changes:

  • Prevent copy.deepcopy() calls where possible
  • Don't run validation twice back to back
  • Move configdb path <> xpath conversion logic to sonic-yang-mgmt where it belongs and enhance it to support schema conversion (not just data) and add caching.
  • Sort table keys by the number of schema backlinks and must statements for the node to try better guess the right order of the patches to generate rather than doing it in alphabetical order which is likely to cause validation failures.
  • Add ability to Group patches together in some commits where its known they will not cause issues, these are things like grouping parameter updates under the same key.
  • When applying changes, do not re-read the configuration from redis twice between each applied patch (this is extremely slow, and actually hid a race condition). We are mutating the configuration and a lock is held so we know the expected before and after. There is still a final validation to ensure something didn't go sideways.

Dependencies:

Stats below ... (stats need both this and the sonic-utilities PR to be relevant)...

Original Performance:
Dry Run:

time sudo config replace -d ./config_db.json
...
real	2m51.588s
user	2m23.777s
sys	0m25.300s

Full:

time sudo config replace ./config_db.json
...
real	14m53.772s
user	12m2.376s
sys	2m8.908s

With Patch:
Dry Run:

time sudo config replace -d ./config_db.json
...
real	0m59.602s
user	0m56.434s
sys	0m2.110s

Full:

time sudo config replace ./config_db.json
...
real	1m54.303s
user	0m58.482s
sys	0m2.545s

So that's roughly 3x improvement for dry-run, and 7.5x improvement for full commit. There is room for improvement on the full commit due to a sleep(1) being used between each patch because of a race condition found in the prior code (that was hidden due to a costly sanity check that has been removed).

How I did it

Profiling via cProfiler

How to verify it

Run sonic-utilities test cases, they still pass

Previous command output (if the output of a command-line utility has changed)

N/A

New command output (if the output of a command-line utility has changed)

N/A

Fixes sonic-net/sonic-buildimage#22372

***Important***: Please review each commit (including commit message) individually.  Looking at the patch-set as a whole may cause confusion.

#### What I did

Generic Configuration Updater is extremely slow, using the python profiler it was possible to determine the worst offenders where changes could be made without affecting the overall algorithm and HLD design documentation.

Brief overview of changes:
* Prevent copy.deepcopy() calls where possible
* Don't run validation twice back to back
* Move configdb path <> xpath conversion logic to sonic-yang-mgmt where it belongs and enhance it to support schema conversion (not just data) and add caching.
* Sort table keys by the number of schema backlinks and must statements for the node to try better guess the right order of the patches to generate rather than doing it in alphabetical order which is likely to cause validation failures.
* Add ability to Group patches together in some commits where its known they will not cause issues, these are things like grouping parameter updates under the same key.
* When applying changes, do not re-read the configuration from redis twice between each applied patch (this is **extremely** slow, and actually hid a race condition).  We are mutating the configuration and a lock is held so we know the expected before and after.  There is still a final validation to ensure something didn't go sideways.

Dependencies:
 * sonic-yang-mgmt enhancements: sonic-net/sonic-buildimage#22254
 * sonic-yang-mgmt parse uses/grouping: sonic-net/sonic-buildimage#21907
 * sonic-utilities rely on sonic-yang-mgmt uses/grouping handling: sonic-net/sonic-utilities#3814

Stats below ... (stats need both this and the sonic-utilities PR to be relevant)...

<ins>**Original Performance:**</ins>
Dry Run:
```
time sudo config replace -d ./config_db.json
...
real	2m51.588s
user	2m23.777s
sys	0m25.300s
```

Full:
```
time sudo config replace ./config_db.json
...
real	14m53.772s
user	12m2.376s
sys	2m8.908s
```

<ins>**With Patch**:</ins>
Dry Run:
```
time sudo config replace -d ./config_db.json
...
real	0m59.602s
user	0m56.434s
sys	0m2.110s
```

Full:
```
time sudo config replace ./config_db.json
...
real	1m54.303s
user	0m58.482s
sys	0m2.545s
```

So that's roughly 3x improvement for dry-run, and 7.5x improvement for full commit.  There is room for improvement on the full commit due to a `sleep(1)` being used between each patch because of a race condition found in the prior code (that was hidden due to a costly sanity check that has been removed).

#### How I did it

Profiling via cProfiler

#### How to verify it

Run sonic-utilities test cases, they still pass

#### Previous command output (if the output of a command-line utility has changed)

N/A

#### New command output (if the output of a command-line utility has changed)

N/A

Fixes sonic-net/sonic-buildimage#22372
@mssonicbld
Copy link
Collaborator Author

Original PR: sonic-net/sonic-utilities#3831

@mssonicbld
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld mssonicbld merged commit 3967604 into Azure:202412 Nov 9, 2025
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant