Skip to content

[202205] [generate dump] Move the Core/Log collection to the End of process Execution and removed default timeout #2230

Merged
yxieca merged 5 commits intosonic-net:202205from
vivekrnv:long_time_fix_ts_2205
Jun 24, 2022
Merged

[202205] [generate dump] Move the Core/Log collection to the End of process Execution and removed default timeout #2230
yxieca merged 5 commits intosonic-net:202205from
vivekrnv:long_time_fix_ts_2205

Conversation

@vivekrnv
Copy link
Copy Markdown
Contributor

What I did

What I did

Besides, it's better to collect the logs in the end, since we could collect more info and also core files are mostly static and it shouldn't matter much even if we collect them late.

[ save_file:/var/core/bash.1653599272.10047.core.gz] : 10041  | [ save_file:/var/core/python3.1653598683.29.core.gz] : 42 mse
[ save_file:/var/core/bash.1653601099.288.core.gz] : 382 msec | [ save_file:/var/core/python3.1653598325.23.core.gz] : 43 mse
[ save_file:/var/crash/kdump_lock] : 473 msec                 | [ save_file:/var/crash/kdump_lock] : 29 msec
[ Warm-boot Files ] : 832 msec                                | [ Warm-boot Files ] : 39 msec
[ save_cmd:show services ] : 2191 msec                        | [ save_cmd:show services ] : 1738 msec
[ save_cmd:show reboot-cause ] : 817 msec                     | [ save_cmd:show reboot-cause ] : 469 msec
[ save_cmd:echo 26/05/2022 21:38:42:138993 ] : 412 msec       | [ save_cmd:echo 26/05/2022 20:58:13:932428 ] : 35 msec
[ save_cmd:show interface counters ] : 2232 msec              | [ save_cmd:show interface counters ] : 1731 msec
[ save_cmd:show queue counters ] : 1775 msec                  | [ save_cmd:show queue counters ] : 1307 msec
[ save_cmd:sonic-db-dump -n 'COUNTERS_DB' -y ] : 1119 msec    | [ save_cmd:sonic-db-dump -n 'COUNTERS_DB' -y ] : 682 msec
[ save_cmd:netstat -i ] : 399 msec                            | [ save_cmd:netstat -i ] : 34 msec
[ save_cmd:ifconfig -a ] : 419 msec                           | [ save_cmd:ifconfig -a ] : 49 msec
[ save_cmd:systemd-analyze blame ] : 775 msec                 | [ save_cmd:systemd-analyze blame ] : 363 msec
[ save_cmd:systemd-analyze dump ] : 473 msec                  | [ save_cmd:systemd-analyze dump ] : 113 msec
[ save_cmd:systemd-analyze plot ] : 1298 msec                 | [ save_cmd:systemd-analyze plot ] : 1009 msec
[ save_cmd:show platform syseeprom ] : 1065 msec              | [ save_cmd:show platform syseeprom ] : 776 msec
[ save_cmd:show platform psustatus ] : 890 msec               | [ save_cmd:show platform psustatus ] : 578 msec
[ save_cmd:show platform ssdhealth ] : 1231 msec              | [ save_cmd:show platform ssdhealth ] : 797 msec
[ save_cmd:show platform temperature ] : 894 msec             | [ save_cmd:show platform temperature ] : 635 msec
[ save_cmd:show platform fan ] : 901 msec                     | [ save_cmd:show platform fan ] : 542 msec
[ save_cmd:show vlan brief ] : 825 msec                       | [ save_cmd:show vlan brief ] : 490 msec
[ save_cmd:show version ] : 955 msec                          | [ save_cmd:show version ] : 543 msec
[ save_cmd:show platform summary ] : 906 msec                 | [ save_cmd:show platform summary ] : 469 msec
[ save_cmd:cat /host/machine.conf ] : 423 msec                | [ save_cmd:cat /host/machine.conf ] : 31 msec
[ save_cmd:docker stats --no-stream ] : 3134 msec             | [ save_cmd:docker stats --no-stream ] : 2712 msec
[ save_cmd:sensors ] : 2106 msec                              | [ save_cmd:sensors ] : 1712 msec
[ save_cmd:lspci -vvv -xx ] : 446 msec                        | [ save_cmd:lspci -vvv -xx ] : 71 msec
[ save_cmd:lsusb -v ] : 398 msec                              | [ save_cmd:lsusb -v ] : 48 msec
[ save_cmd:sysctl -a ] : 693 msec                             | [ save_cmd:sysctl -a ] : 335 msec
[ save_cmd:ip link ] : 407 msec                               | [ save_cmd:ip link ] : 43 msec
[ save_cmd:ip addr ] : 377 msec                               | [ save_cmd:ip addr ] : 33 msec
[ save_cmd:ip rule ] : 381 msec                               | [ save_cmd:ip rule ] : 31 msec
  1. Thus moved the core/log collection to the end.

  2. But there is a catch regarding the above change, For eg: system is in a unstable state and most of the individual commands start to timeout, the techsupport dump eventually times out at 30m (because of the global timeout), then the dump is pretty useless, since it might not have any useful information at all
    Thus, i've removed the default global timeout, Clients can/should knowingly provide a value using -g option if the execution time has to be capped.

  3. A global timeout of 60 mins is used for Global timeout for Auto-techsupport invocation.

Backport #2209 to 2205

How I did it

How to verify it

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

vivekrnv added 5 commits June 23, 2022 18:42
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
@yxieca yxieca merged commit 430cd65 into sonic-net:202205 Jun 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants