Skip to content

[warm-reboot] Add preboot LAG sad path automation#945

Merged
neethajohn merged 5 commits intosonic-net:masterfrom
neethajohn:warmboot-lag
Jun 13, 2019
Merged

[warm-reboot] Add preboot LAG sad path automation#945
neethajohn merged 5 commits intosonic-net:masterfrom
neethajohn:warmboot-lag

Conversation

@neethajohn
Copy link
Contributor

Description of PR

Extend the existing warm-reboot test to include the preboot lag sad path for both neigh end and DUT end

  • Depending on the preboot_oper type, bring down the DUT LAG or the neigh LAG.
  • Verify Lag and BGP status on the DUT end and neigh end
  • warm-boot test runs
  • Once the DUT is up and running, do the postboot verify - bgp/lag status on the DUT end and neigh end should be down
  • revert lag state to up

Type of change

  • [] Bug fix
  • [] Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Approach

  • add 'neigh_lag_down' and 'dut_lag_down' tags in warm-reboot-sad.yml
  • Add new methods in sad_path.py for changing the lag state and verifying the LAG and LAG member states on the DUT end
  • Add new methods in arista.py for changing the lag state and verifying the LAG state on the neigh end
  • Add pre_check flag in some methods to determine if the failure was during the preboot/postboot phase

How did you verify/test it?

Ran the warm-reboot-sad.yml . All cases except 'neigh_lag_down' case passed. Neigh_lag_down failure is a known issue and will be fixed soon
Verified on T0-64 topology

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the template here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't get you. Are you referring to passing the state string directly? If yes, this was a feedback I received from Qi in my earlier PR to pass in a bool to avoid misspelling name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write
self.do_cmd(state[is_up])
instead of
self.do_cmd('%s' % state[is_up])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@qiluo-msft qiluo-msft Jun 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lag [](start = 37, length = 3)

lag [](start = 37, length = 3)

You are concat cmd string, always check input parameter to prevent attack. #Closed

Copy link
Contributor

@qiluo-msft qiluo-msft Jun 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lag [](start = 59, length = 3)

lag [](start = 59, length = 3)

the same #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@qiluo-msft qiluo-msft Jun 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre_check [](start = 21, length = 9)

The variable is effective conditionally, suggest name it as ''sad_lag_pre_check" #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This var is not specific to the LAG test. I will be using it for other preboot types as well. I am using this var to tag the failed msg with Preboot/Postboot since the failed msgs are all consolidated and printed at the end of the test. Hard to figure out when the failure occurred since the same method is used for both preboot and postboot checks.

Copy link
Contributor

@qiluo-msft qiluo-msft Jun 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state [](start = 8, length = 5)

state [](start = 8, length = 5)

better not to reuse var, and name it as 'states' #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@qiluo-msft qiluo-msft Jun 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sudo config interface %s %s [](start = 100, length = 27)

sudo config interface %s %s [](start = 100, length = 27)

input injection risk #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added validation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'sudo config interface %s %s' % (state[is_up], self.vm_dut_map[self.neigh_name]['dut_portchannel']) [](start = 99, length = 99)

extract to another statement, don't make any one too much complex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@qiluo-msft qiluo-msft Jun 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fails [](start = 8, length = 5)

fails [](start = 8, length = 5)

We prefer return bool positively, like 'successful' #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed var to 'success'

Copy link
Contributor

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, please check with other reviewers.

Copy link
Contributor

@pavel-shirshov pavel-shirshov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@neethajohn neethajohn merged commit d092e87 into sonic-net:master Jun 13, 2019
@neethajohn neethajohn deleted the warmboot-lag branch June 13, 2019 16:14
yxieca pushed a commit that referenced this pull request Jun 13, 2019
* preboot LAG sad path automation for neigh_lag_down and dut_lag_down scenarios
fraserg-arista pushed a commit to fraserg-arista/sonic-mgmt that referenced this pull request Feb 24, 2026
)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit easier:
-->
### Description of PR
On Scale testbed there are 2 main differences:
1. There are much more PTF ports
2. There are much more OVS and interface configurations

Those lead to:
1. NIC would take more time to handle the receiving packets
2. PTF could take more time to process

As a result:
1. More time to spend in ptf function testutils.verify_packet

Summary:
Fixes # (issue)

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement

### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?
To increase the PTF traffic handling stability in the scale testbed
#### How did you do it?
Add more wait time in ptf function testutils.verify_packet
#### How did you verify/test it?
Run it in the local scale testbed
#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
Submodule src/sonic-platform-common 42119e1..5d7954e:
  > [ChassisBase] Make reboot cause constant strings human-readable (sonic-net#35)
  > Add .gitignore file (sonic-net#28)
  > [sonic_platform_base] Add sonic_sfp and sonic_eeprom to sonic_platform_base (sonic-net#27)
  > Enhance new platform API (sonic-net#19)
  > fix typo in platform API base class (sonic-net#25)

Submodule src/sonic-swss 9cf7b01..1e99c93:
  > Set timer only when interval changes. Not in each firing of the timer. (sonic-net#945)

Submodule src/sonic-utilities ec1e93f..24958f1:
  > [fast reboot] stop removing opennsl module before reboot (sonic-net#560)

Submodule src/sonic-swss-common b472f6e..d6140fa:
  > timerfd:read failure - Record in logs as error. (sonic-net#295)
  > do not abort when read timerfd return 0 and errno = 0 (sonic-net#291)
  > Add an assert to logger, which will log a message and abort. (sonic-net#286)

Signed-off-by: Ying Xie <[email protected]>
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
sonic-swss:

* 67f7191 2019-06-20 | Set timer only when interval changes. Not in each firing of the timer. (sonic-net#945) (HEAD -> 201803, origin/201803) [Renuka Manavalan]
* 2cae61b 2019-07-07 | Fix: crash while destructing crmorch (sonic-net#731) (sonic-net#949) [Renuka Manavalan]

sonic-swss-common:

* 58b1930 2019-07-07 | Add an assert to logger, which will log a message and abort. (sonic-net#286) (sonic-net#287) (HEAD -> 201803, origin/201803) [Renuka Manavalan]
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
Add watchdogutil to control the hw watchdog (sonic-net#945)
[db_migrator] Support migrating database regarding buffer
configuration for all Mellanox switches (sonic-net#1053)

Signed-off-by: Abhishek Dosi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants