You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Breakdown tasks in the section of "Design of test" (#308)
* Breakdown tasks in the section of "Design of test"
* Add more details
* Refine according discussion and feedback
* Add checkboxes to tasks, and refine markdown syntax
* Mark more tasks as done
Signed-off-by: Qi Luo <[email protected]>
Copy file name to clipboardExpand all lines: doc/warm-reboot/system-warmboot.md
+74-34Lines changed: 74 additions & 34 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -69,42 +69,82 @@ Later if we improve the consistency ```SONIC_BOOT_TYPE=[fast|warm|cold]```, this
69
69
# Design of test
70
70
Assumptions:
71
71
1. DUT is T0 topology
72
-
2. Focus on one image warm reboot, and version upgrading warm reboot. No version downgrading warm reboot.
72
+
2. Focus on whole system reboot, in future will extend it to container level warm restart
73
+
3. Focus on one image warm reboot, and version upgrading warm reboot. No version downgrading warm reboot.
74
+
75
+
Structure of testbed: [design doc](https://github.com/Azure/sonic-mgmt/blob/master/ansible/doc/README.testbed.Overview.md#sonic-testbed-overview)
- Both warm-reboot and fast-reboot are written in ansible playbook [advanced-reboot.yml](https://github.com/Azure/sonic-mgmt/blob/master/ansible/roles/test/tasks/advanced-reboot.yml)
81
+
- The playbook will deploy a master python script [advanced-reboot.py](https://github.com/Azure/sonic-mgmt/blob/master/ansible/roles/test/files/ptftests/advanced-reboot.py) to PTF docker container and all the steps are running there
82
+
- The master python script will
83
+
- ssh into DUT to execute reboot command
84
+
- ssh into Arist EOS VM to observe and operate port, port channel and BGP sessions
85
+
- operate VLAN ports
86
+
- store and analysis data
73
87
74
88
Steps:
75
-
1. Prepare
89
+
1. Prepare environment
76
90
- Enable link state propagation
77
-
2. Before warm reboot
78
-
- Happy Path
79
-
- Sad Path
80
-
- DUT port down
81
-
- DUT LAG down
82
-
- DUT LAG member down
83
-
- DUT BGP session down
84
-
- Neigh port down
85
-
- Neigh LAG remove member
86
-
- Neigh LAG admin down
87
-
- Neigh LAG member admin down
88
-
- Neigh BGP session admin down
89
-
3. During warm reboot
90
-
- Happy Path
91
-
- Observe no port down from VM side (all the same below)
92
-
- Observe LAG, the maximal control plane interval is 90s
93
-
- Observe BGP session
94
-
- Observe no packet drop
95
-
- Sad Path
96
-
- Neigh port down
97
-
- Neigh LAG remove member
98
-
- Neigh LAG admin down
99
-
- Neigh LAG member admin down
100
-
- Neigh BGP session admin down
101
-
- Neigh route change
102
-
- Neigh MAC change
103
-
- Neigh VLAN member port admin downn (some or all)
104
-
4. After warm reboot
105
-
- CRM is not increasing for happy path during warm reboot
106
-
- Check expected response for sad path during warm reboot
107
-
- Recheck all observation in Section 3 - Happy Path
108
-
- Link_flap
91
+
-[x] Propagate VEOS port admin down to Fanout switch
92
+
-[ ] Propagete PTF port down to Fanout switch
93
+
-[ ] Enable NTP service in DUT, Arista EOS VMs, PTF docker
94
+
95
+
2. Prepare DUT with user specified states `pre_reboot_vector`
96
+
-[ ] DUT port down
97
+
-[ ] DUT LAG down
98
+
-[ ] DUT LAG member down
99
+
-[ ] DUT BGP session down
100
+
-[ ] Neigh port down
101
+
-[ ] Neigh LAG remove member
102
+
-[ ] Neigh LAG admin down
103
+
-[ ] Neigh LAG member admin down
104
+
-[ ] Neigh BGP session admin down
105
+
106
+
3. Pre-warm-reboot status check
107
+
-[ ] VM: Port.lastStatusChangeTimestamp
108
+
-[x] VM: PortChannel.lastStatusChangeTimestamp
109
+
-[x] VM: monitor how many routes received from DUT
110
+
-[ ] DUT: console connect and keep measure meaningful events such as shutdown and bootup
111
+
-[ ] Observe no packet drop
112
+
- current implementation of advanced-reboot waits for ping down, which is not working for warm-reboot
113
+
- if any packet drop, test fails
114
+
- how to know warm-shutdown and warm-bootup timestamp?
115
+
-[ ] CRM usage snapshot: the gold here is make sure no usage increase for no sad injected case
116
+
117
+
4. During-warm-reboot sad vector injection `during_reboot_vector`
118
+
-[ ] Neigh port down
119
+
-[ ] Neigh LAG remove member
120
+
-[ ] Neigh LAG admin down
121
+
-[ ] Neigh LAG member admin down
122
+
-[ ] Neigh BGP session admin down
123
+
-[ ] Neigh route change
124
+
-[ ] Neigh MAC change
125
+
-[ ] Neigh VLAN member port admin down (some or all)
126
+
127
+
And conduct some measurement:
128
+
-[x] Ping DUT loopback IP from a downlink port
129
+
-[ ] Ping from one DUT port to another (may choose some pairs or fullmesh)
130
+
-[ ] measure how many times disrupted
131
+
- fastfast reboot will expect once
132
+
- normal warm reboot will expect none
133
+
- fast reboot will expect once
134
+
-[ ] measure how long the longest dirutpive time
135
+
136
+
137
+
5. Post-warm-reboot status check
138
+
-[ ] Generate expected\_results based on `pre_reboot_vector` + `during_reboot_vector`
139
+
-[ ] VM: Port.lastStatusChangeTimestamp
140
+
-[x] VM: PortChannel.lastStatusChangeTimestamp
141
+
-[x] VM: monitor how many routes received from DUT
142
+
-[ ] DUT: check the image version as expected
143
+
-[x] Observe no packet drop: current implementation of advanced-reboot waits for ping recover, which is not working for warm-reboot
144
+
-[ ] CRM is not increasing for happy path during warm reboot
0 commit comments