Skip to content

[config] Exit with non-zero when qos reload fail#3710

Merged
kperumalbfn merged 3 commits intosonic-net:masterfrom
jianyuewu:config_qos_reload_fix
Jan 21, 2025
Merged

[config] Exit with non-zero when qos reload fail#3710
kperumalbfn merged 3 commits intosonic-net:masterfrom
jianyuewu:config_qos_reload_fix

Conversation

@jianyuewu
Copy link
Contributor

Fix issue when 'config qos reload' operation unsuccessful, but returns exit code 0, when operation unsuccessful, now it will return code 1 after executing all the commands.

What I did

When config qos reload failed with error print "Operation not completed successfully", exit with code 1 finally, instead of exit with code 0.

How I did it

Check return value in function _wait_until_clear(), and after config qos reload command finish, check this flag, if it is none zero, then exit 1.

How to verify it

Test with command "config qos reload", will exit 1. Note that "config load_minigraph" is also verified, not impacted.

Previous command output (if the output of a command-line utility has changed)

# config qos reload
Operation not completed successfully, please save and reload configuration.
Running command: /usr/local/bin/sonic-cfggen -d -t /usr/share/sonic/device/x86_64-nvidia_sn5600-r0/Mellanox-SN5600-C256S1/buffers_dynamic.json.j2,/tmp/cfg_buffer.json -t /usr/share/sonic/device/x86_64-nvidia_sn5600-r0/Mellanox-SN5600-C256S1/qos.json.j2,/tmp/cfg_qos.json -y /etc/sonic/sonic_version.yml
Running command: /usr/local/bin/sonic-cfggen -j /tmp/cfg_buffer.json -j /tmp/cfg_qos.json --write-to-db
Buffer calculation model updated, restarting swss is required to take effect

New command output (if the output of a command-line utility has changed)

# config qos reload
Operation not completed successfully, please save and reload configuration.
Running command: /usr/local/bin/sonic-cfggen -d -t /usr/share/sonic/device/x86_64-nvidia_sn5600-r0/Mellanox-SN5600-C256S1/buffers_dynamic.json.j2,/tmp/cfg_buffer.json -t /usr/share/sonic/device/x86_64-nvidia_sn5600-r0/Mellanox-SN5600-C256S1/qos.json.j2,/tmp/cfg_qos.json -y /etc/sonic/sonic_version.yml
Running command: /usr/local/bin/sonic-cfggen -j /tmp/cfg_buffer.json -j /tmp/cfg_qos.json --write-to-db
Buffer calculation model updated, restarting swss is required to take effect
# echo $?
1

Fix issue when 'config qos reload' operation unsuccessful,
but returns exit code 0, when operation unsuccessful,
now it will return code 1 after executing all the commands.

Signed-off-by: Jianyue Wu <jianyuew@nvidia.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@keboliu keboliu requested a review from stephenxs January 10, 2025 02:29
@bingwang-ms
Copy link
Contributor

bingwang-ms commented Jan 13, 2025

Please coordinate this change with PR #3708 .
Should the return value of _wait_until_clear always be True if timeout==0?

@bingwang-ms
Copy link
Contributor

@vivekrnv Please help review this change.

@vivekrnv
Copy link
Contributor

Please coordinate this change with PR #3708 . Should the return value of _wait_until_clear always be True if timeout==0?

Yes, good catch. I've updated #3708

config/main.py Outdated
# Traditional buffer manager do not remove buffer tables in any case, no need to wait.
timeout = 120 if device_metadata and device_metadata.get('buffer_model') == 'dynamic' else 0
_wait_until_clear(["BUFFER_*_TABLE:*", "BUFFER_*_SET"], interval=0.5, timeout=timeout, verbose=verbose)
return _wait_until_clear(["BUFFER_*_TABLE:*", "BUFFER_*_SET"], interval=0.5, timeout=timeout, verbose=verbose)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check for actual table/config returning error? Will be good for debugging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently inside wait_until_clear() function, when enable --verbose mode, it will print out table and config error details.
$ config qos reload **--verbose
Some entries matching BUFFER
_TABLE: still exist: BUFFER_PROFILE_TABLE:ingress_lossy_pg_zero_profile
Some entries matching BUFFER__TABLE: still exist: BUFFER_PROFILE_TABLE:ingress_lossy_pg_zero_profile**
Operation not completed successfully, please save and reload configuration.
Running command: /usr/local/bin/sonic-cfggen -d -t /usr/share/sonic/device/x86_64-mlnx_msn3800-r0/Mellanox-SN3800-D112C8/buffers_dynamic.json.j2,/tmp/cfg_buffer.json -t /usr/share/sonic/device/x86_64-mlnx_msn3800-r0/Mellanox-SN3800-D112C8/qos.json.j2,/tmp/cfg_qos.json -y /etc/sonic/sonic_version.yml
Running command: /usr/local/bin/sonic-cfggen -j /tmp/cfg_buffer.json -j /tmp/cfg_qos.json --write-to-db

if buffer_model_updated:
print("Buffer calculation model updated, restarting swss is required to take effect")

if not status:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error log with additional details

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously in _wait_until_clear(), it has printed with this error, maybe user can refer to this one😊

  • if not empty and timeout:
    click.echo("Operation not completed successfully, please save and reload configuration.")

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Remove test_qos_wait_until_clear_not_empty_should_exit due to
the special handling of timeout=0 indicating traditional buffer mode,
making the test obsolete.

Signed-off-by: Jianyue Wu <jianyuew@nvidia.com>
@jianyuewu jianyuewu force-pushed the config_qos_reload_fix branch from 62d37cf to 86a9205 Compare January 14, 2025 05:18
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jianyuewu
Copy link
Contributor Author

vivekrnv added _wait_until_clear UT for special case timeout=0 in PR #3708, so remove in current PR to avoid conflict.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jianyuewu jianyuewu force-pushed the config_qos_reload_fix branch from b02ce0a to 1724b7e Compare January 15, 2025 08:56
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

config qos reload --no-delay should return with 0, instead of 1.

Signed-off-by: Jianyue Wu <jianyuew@nvidia.com>
@jianyuewu jianyuewu force-pushed the config_qos_reload_fix branch from 1724b7e to a2f6d86 Compare January 15, 2025 09:14
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202411: #3727

@jianyuewu
Copy link
Contributor Author

Cherry-pick PR to 202411: #3727

Thank you.

ayurkiv-nvda pushed a commit to ayurkiv-nvda/sonic-utilities that referenced this pull request Jan 22, 2025
ayurkiv-nvda pushed a commit to ayurkiv-nvda/sonic-utilities that referenced this pull request Jan 22, 2025
ayurkiv-nvda pushed a commit to ayurkiv-nvda/sonic-utilities that referenced this pull request Jan 22, 2025
nmoray pushed a commit to nmoray/sonic-utilities that referenced this pull request Jun 25, 2025
[config] Exit with non-zero when qos reload fail
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants