Skip to content

[Mellanox] align platform_reboot to use "hardware reboot"#3320

Merged
jleveque merged 3 commits intosonic-net:201811from
mykolaf:platform_reboot_201811
Aug 19, 2019
Merged

[Mellanox] align platform_reboot to use "hardware reboot"#3320
jleveque merged 3 commits intosonic-net:201811from
mykolaf:platform_reboot_201811

Conversation

@mykolaf
Copy link
Copy Markdown
Collaborator

@mykolaf mykolaf commented Aug 9, 2019

Signed-off-by: Mykola Faryma [email protected]

- What I did
Aligned Mellanox specific platform_reboot script to perform a power cycle instead of calling /sbin/reboot. Also added verbosity to the fw upgrade script call.

Motivation: switches were reported "stuck" after a reboot call, most vendors use "hardware reboot" in their platform_reboot scripts. We now go with the same approach of a more harsh reboot.

- How I did it

- How to verify it

  • manual reboot
  • continuous reboot test

- Description for the changelog

Mellanox platform_reboot to use power cycle

- A picture of a cute animal (not mandatory but encouraged)

@mykolaf
Copy link
Copy Markdown
Collaborator Author

mykolaf commented Aug 12, 2019

retest this please

fi

exec /sbin/reboot $@
echo 1 > /bsp/system/pwr_cycle
Copy link
Copy Markdown
Collaborator

@qiluo-msft qiluo-msft Aug 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to fail? If you stop the BSP service, the file will disappear, and finally reboot will not work. #Closed

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, changed to use the sysfs path itself.

@qiluo-msft
Copy link
Copy Markdown
Collaborator

        -f|--force)

This option is ignored in new code. I remember our discussion concluded that 'force/hardware-reboot' is the default option. When user want to 'normal/graceful-reboot', he/she should provide an option.


Refers to: device/mellanox/x86_64-mlnx_msn2700-r0/platform_reboot:13 in 9b655b1. [](commit_id = 9b655b1, deletion_comment = False)

Copy link
Copy Markdown
Collaborator

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As comments

@qiluo-msft qiluo-msft requested review from jleveque and yxieca August 12, 2019 22:49
@mykolaf
Copy link
Copy Markdown
Collaborator Author

mykolaf commented Aug 13, 2019

@qiluo-msft We decided to align with other vendors in using hw-reboot as the default option. Introducing graceful reboot option was put into backlog. The -f option only provides ability to overcome fw-upgrade failure.

@mykolaf
Copy link
Copy Markdown
Collaborator Author

mykolaf commented Aug 19, 2019

retest broadcom please

@jleveque jleveque merged commit 8d5e37d into sonic-net:201811 Aug 19, 2019
@keboliu
Copy link
Copy Markdown
Collaborator

keboliu commented Sep 11, 2019

@jleveque @mykolaf with this change can we know the real reboot reason? User issued a software reboot command, but actually it is triggered by a power cycle behind the scenes, and the "show reboot-cause" will tell the user that the reboot cause is not a "software reboot" but a "power cycle", seems quite confusing, would like to hear from you.

@jleveque
Copy link
Copy Markdown
Contributor

@keboliu: This change definitely has the potential to provide the incorrect reboot reason. There needs to be accounting performed on the platform side to prevent this (e.g., if there is a way to check whether the software triggered the hardware reset via some register).

@keboliu
Copy link
Copy Markdown
Collaborator

keboliu commented Sep 12, 2019

@jleveque @stephenxs I think we may need to change current logic, maybe combine check software reboot cause and hardware reboot cause? in this case, we do have a hardware reboot cause - power cycle, with current logic the software reboot cause check will be skipped.

@stephenxs
Copy link
Copy Markdown
Collaborator

@jleveque @stephenxs I think we may need to change current logic, maybe combine check software reboot cause and hardware reboot cause? in this case, we do have a hardware reboot cause - power cycle, with current logic the software reboot cause check will be skipped.

If most of the vendors implement reboot by using powercycle, is it better to update the logic of process-reboot-cause to checking /host/reboot-cause/reboot-cause.txt first and then checking the platform-dependent hardware reboot cause?

@mykolaf
Copy link
Copy Markdown
Collaborator Author

mykolaf commented Sep 17, 2019

@qiluo-msft

-f|--force)
This option is ignored in new code.

I don't understand, how this option is ignored in new code?
All the logic related to it is left in place, see line L#29 of changed file.

@mykolaf mykolaf deleted the platform_reboot_201811 branch February 18, 2020 13:14
mssonicbld added a commit that referenced this pull request Nov 15, 2024
…lly (#20748)

#### Why I did it
src/sonic-swss
```
* f650a3b - (HEAD -> master, origin/master, origin/HEAD) [ACL] Add support to match on Tunnel Termination (#3320) (2 days ago) [Vivek]
* 7db69c1 - VOQ: Set the ECMP group size to 128. (#3351) (3 days ago) [Deepak Singhal]
* 956ebd6 - Handler Port oper down error status notification (#3350) (6 days ago) [Prince George]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants