Skip to content

Commit 4b40184

Browse files
qiluo-msftlguohan
authored andcommitted
Breakdown tasks in the section of "Design of test" (#308)
* Breakdown tasks in the section of "Design of test" * Add more details * Refine according discussion and feedback * Add checkboxes to tasks, and refine markdown syntax * Mark more tasks as done Signed-off-by: Qi Luo <[email protected]>
1 parent d141bc0 commit 4b40184

File tree

1 file changed

+74
-34
lines changed

1 file changed

+74
-34
lines changed

doc/warm-reboot/system-warmboot.md

Lines changed: 74 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -69,42 +69,82 @@ Later if we improve the consistency ```SONIC_BOOT_TYPE=[fast|warm|cold]```, this
6969
# Design of test
7070
Assumptions:
7171
1. DUT is T0 topology
72-
2. Focus on one image warm reboot, and version upgrading warm reboot. No version downgrading warm reboot.
72+
2. Focus on whole system reboot, in future will extend it to container level warm restart
73+
3. Focus on one image warm reboot, and version upgrading warm reboot. No version downgrading warm reboot.
74+
75+
Structure of testbed: [design doc](https://github.com/Azure/sonic-mgmt/blob/master/ansible/doc/README.testbed.Overview.md#sonic-testbed-overview)
76+
![Physical topology](https://github.com/Azure/sonic-mgmt/raw/master/ansible/doc/img/testbed.png)
77+
![Testbed server](https://raw.githubusercontent.com/Azure/sonic-mgmt/master/ansible/doc/img/testbed-server.png)
78+
79+
Architect:
80+
- Both warm-reboot and fast-reboot are written in ansible playbook [advanced-reboot.yml](https://github.com/Azure/sonic-mgmt/blob/master/ansible/roles/test/tasks/advanced-reboot.yml)
81+
- The playbook will deploy a master python script [advanced-reboot.py](https://github.com/Azure/sonic-mgmt/blob/master/ansible/roles/test/files/ptftests/advanced-reboot.py) to PTF docker container and all the steps are running there
82+
- The master python script will
83+
- ssh into DUT to execute reboot command
84+
- ssh into Arist EOS VM to observe and operate port, port channel and BGP sessions
85+
- operate VLAN ports
86+
- store and analysis data
7387

7488
Steps:
75-
1. Prepare
89+
1. Prepare environment
7690
- Enable link state propagation
77-
2. Before warm reboot
78-
- Happy Path
79-
- Sad Path
80-
- DUT port down
81-
- DUT LAG down
82-
- DUT LAG member down
83-
- DUT BGP session down
84-
- Neigh port down
85-
- Neigh LAG remove member
86-
- Neigh LAG admin down
87-
- Neigh LAG member admin down
88-
- Neigh BGP session admin down
89-
3. During warm reboot
90-
- Happy Path
91-
- Observe no port down from VM side (all the same below)
92-
- Observe LAG, the maximal control plane interval is 90s
93-
- Observe BGP session
94-
- Observe no packet drop
95-
- Sad Path
96-
- Neigh port down
97-
- Neigh LAG remove member
98-
- Neigh LAG admin down
99-
- Neigh LAG member admin down
100-
- Neigh BGP session admin down
101-
- Neigh route change
102-
- Neigh MAC change
103-
- Neigh VLAN member port admin downn (some or all)
104-
4. After warm reboot
105-
- CRM is not increasing for happy path during warm reboot
106-
- Check expected response for sad path during warm reboot
107-
- Recheck all observation in Section 3 - Happy Path
108-
- Link_flap
91+
- [x] Propagate VEOS port admin down to Fanout switch
92+
- [ ] Propagete PTF port down to Fanout switch
93+
- [ ] Enable NTP service in DUT, Arista EOS VMs, PTF docker
94+
95+
2. Prepare DUT with user specified states `pre_reboot_vector`
96+
- [ ] DUT port down
97+
- [ ] DUT LAG down
98+
- [ ] DUT LAG member down
99+
- [ ] DUT BGP session down
100+
- [ ] Neigh port down
101+
- [ ] Neigh LAG remove member
102+
- [ ] Neigh LAG admin down
103+
- [ ] Neigh LAG member admin down
104+
- [ ] Neigh BGP session admin down
105+
106+
3. Pre-warm-reboot status check
107+
- [ ] VM: Port.lastStatusChangeTimestamp
108+
- [x] VM: PortChannel.lastStatusChangeTimestamp
109+
- [x] VM: monitor how many routes received from DUT
110+
- [ ] DUT: console connect and keep measure meaningful events such as shutdown and bootup
111+
- [ ] Observe no packet drop
112+
- current implementation of advanced-reboot waits for ping down, which is not working for warm-reboot
113+
- if any packet drop, test fails
114+
- how to know warm-shutdown and warm-bootup timestamp?
115+
- [ ] CRM usage snapshot: the gold here is make sure no usage increase for no sad injected case
116+
117+
4. During-warm-reboot sad vector injection `during_reboot_vector`
118+
- [ ] Neigh port down
119+
- [ ] Neigh LAG remove member
120+
- [ ] Neigh LAG admin down
121+
- [ ] Neigh LAG member admin down
122+
- [ ] Neigh BGP session admin down
123+
- [ ] Neigh route change
124+
- [ ] Neigh MAC change
125+
- [ ] Neigh VLAN member port admin down (some or all)
126+
127+
And conduct some measurement:
128+
- [x] Ping DUT loopback IP from a downlink port
129+
- [ ] Ping from one DUT port to another (may choose some pairs or fullmesh)
130+
- [ ] measure how many times disrupted
131+
- fastfast reboot will expect once
132+
- normal warm reboot will expect none
133+
- fast reboot will expect once
134+
- [ ] measure how long the longest dirutpive time
135+
136+
137+
5. Post-warm-reboot status check
138+
- [ ] Generate expected\_results based on `pre_reboot_vector` + `during_reboot_vector`
139+
- [ ] VM: Port.lastStatusChangeTimestamp
140+
- [x] VM: PortChannel.lastStatusChangeTimestamp
141+
- [x] VM: monitor how many routes received from DUT
142+
- [ ] DUT: check the image version as expected
143+
- [x] Observe no packet drop: current implementation of advanced-reboot waits for ping recover, which is not working for warm-reboot
144+
- [ ] CRM is not increasing for happy path during warm reboot
145+
109146
5. Clean-up
110147
- Disable link state propagation
148+
- Recover environment
149+
- [ ] DUT reload minigraph
150+
- [ ] Neigh copy startup-config to running-config

0 commit comments

Comments
 (0)