[fast reboot] Revert fast-reboot script changes#982
Merged
qiluo-msft merged 1 commit intosonic-net:masterfrom Jun 27, 2019
Merged
[fast reboot] Revert fast-reboot script changes#982qiluo-msft merged 1 commit intosonic-net:masterfrom
qiluo-msft merged 1 commit intosonic-net:masterfrom
Conversation
qiluo-msft
approved these changes
Jun 27, 2019
lguohan
approved these changes
Jun 27, 2019
neethajohn
added a commit
that referenced
this pull request
Jun 27, 2019
fraserg-arista
pushed a commit
to fraserg-arista/sonic-mgmt
that referenced
this pull request
Feb 24, 2026
<!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Summary: Fixes # (issue) This PR addresses **non‑linear dataplane downtime behavior** observed in high‑scale BGP IPv6 scenarios when running the port and session flapping tests. When the number of connections to flap doubled, the dataplane downtime increased by 450x. This change refines the tests and helper logic to ensure that downtime measurements: - More accurately reflect real control‑plane and data‑plane outage intervals, - Scale more predictably with load and iterations, and - Avoid over‑counting or under‑counting downtime due to measurement artifacts and overlapping events. ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [ x ] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [ ] Test case improvement ### Back port request - [ ] 202205 - [ ] 202305 - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 ### Approach #### What is the motivation for this PR? While validating high‑scale BGP convergence, flap, and route‑programming tests, we observed that: - Dataplane downtime did not scale linearly with: - The number of flap iterations, - The number of routes or neighbors. These issues were traced to how the tests were executed sequentially while the PTF dataplane packet‑filtering/counter state was never cleared between runs. As a result, masks and counters kept accumulating over time, so that each subsequent run especially those with a larger number of ports to flap saw an artificially inflated dataplane downtime. In other words, the measured non‑linear increase in downtime was caused by PTF dataplane state rather than actual BGP control‑plane behavior. The goal of this PR is to: - Properly reset/clean relevant PTF dataplane state between runs, - Ensure that measured dataplane downtime reflects only the real BGP and data‑plane behavior, - Restore a linear and predictable relationship between test scale (routes/neighbors/iterations) and observed downtime. #### How did you do it? - Added logic to explicitly **clear PTF dataplane state between runs**, including: - Flushing or re‑initializing PTF packet filters used for counting traffic to the prefixes under test. - Resetting relevant PTF counters so that each run starts with a clean environment. - Updated the test flow so that: - Each scale/iteration configuration first ensures PTF dataplane state is clean before starting flaps and dataplane measurements. - Dataplane downtime is computed only from counters and observations collected **within** the current run, avoiding any contamination from previous runs. - Adjusted/factored helper utilities (where appropriate) so that the PTF cleanup is: - Centralized and reusable across the convergence, flap, and route‑programming tests, - Invoked consistently whenever a new test scenario or iteration is started. - Enhanced logging around: - When PTF dataplane state is cleared, - Per‑iteration dataplane downtime measurements after the fix, so it is easy to verify that: - Counters are reset when expected, and - The resulting downtime scales linearly with the number of ports/routes/iterations, reflecting actual BGP and dataplane behavior. #### How did you verify/test it? - Re‑ran the high‑bgp convergence, flap, and route‑programming tests with the fixes applied: - Topology: `t0-isolated-d2u510s2` - Platform: Broadcom Arista-7060X6-64PE-B-C512S2 - Verified that: - Measured downtime per iteration is stable and scales predictably with load and iteration count. - Spurious spikes caused by measurement artifacts are eliminated and stay within millisecond compared to previous tens of seconds. #### Any platform specific information? #### Supported testbed topology if it's a new test case? ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? --> --------- Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com>
kazinator-arista
pushed a commit
to kazinator-arista/sonic-mgmt
that referenced
this pull request
Mar 4, 2026
* [201811][sairedis][swss] advance sub modules head Submodule src/sonic-sairedis 18ad5f9..4c75b7f: > Fixed conditional operator. (sonic-net#487) Submodule src/sonic-swss 1e99c93..cd12d48: > [teamsyncd]: Add information for LAG membership changes (sonic-net#982) > Fix vlan incremental config and add vs test cases (sonic-net#799) Signed-off-by: Ying Xie <ying.xie@microsoft.com> * [swss] include more swss changes Submodule src/sonic-swss cd12d48..f44029d: > [MirrorOrch]: Init the next hop ip with 0 instead of default constructor (sonic-net#953) > [AclOrch]: Fix the acl mirror counter doubled by inactive mirror and active again (sonic-net#952) Signed-off-by: Ying Xie <ying.xie@microsoft.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Revert part of the changes made in PR #975. Remove the fast-reboot script and the corresponding changes made for its use.
Type of change