[tests]: Add tests for high frequency telemetry#20379
[tests]: Add tests for high frequency telemetry#20379StormLiangMS merged 9 commits intosonic-net:masterfrom
Conversation
Signed-off-by: Ze Gan <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Ze Gan <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| pytest_assert( | ||
| validation_results['total_counters'] >= min_expected_counters, | ||
| f"Expected at least {min_expected_counters} counters, " | ||
| f"got {actual_counters}" |
There was a problem hiding this comment.
@Pterosaur actual_counters was not assigned on this point
| counter_values.append(counter_value) | ||
| pytest_assert( | ||
| counter_value > min_counter_value, | ||
| f"Counter value {counter_value} should be greater " |
There was a problem hiding this comment.
The tests assumes that there should always be values greater then zero on all counters.
Can we assume that? If (for example) no drop happened on the system, and the tests does not force in any way a drop to happen, we can't guarantee that IF_IN_DISCARDS will not have zero as the correct value.
There was a problem hiding this comment.
Thanks for your point, I will fix it.
| def run_countersyncd_and_capture_output(duthost, timeout=120): | ||
| """ | ||
| Run countersyncd command and capture output. | ||
|
|
||
| Args: | ||
| duthost: DUT host object | ||
| timeout: Timeout in seconds (default: 120) | ||
|
|
||
| Returns: | ||
| dict: Command result with stdout, stderr, rc | ||
| """ | ||
| countersyncd_cmd = ( | ||
| f'timeout {timeout} docker exec swss countersyncd -e ' | ||
| f'--max-stats-per-report 0 ' | ||
| ) |
There was a problem hiding this comment.
| def run_countersyncd_and_capture_output(duthost, timeout=120): | |
| """ | |
| Run countersyncd command and capture output. | |
| Args: | |
| duthost: DUT host object | |
| timeout: Timeout in seconds (default: 120) | |
| Returns: | |
| dict: Command result with stdout, stderr, rc | |
| """ | |
| countersyncd_cmd = ( | |
| f'timeout {timeout} docker exec swss countersyncd -e ' | |
| f'--max-stats-per-report 0 ' | |
| ) | |
| def run_countersyncd_and_capture_output(duthost, timeout=120, stats_interval=60): | |
| ... | |
| f'--max-stats-per-report 0 ' | |
| f'--stats-interval {stats_interval}' | |
| ) |
I suggest adding the option the override the default stats-interval, from my check this is needed for the longer poll interval tests
| timeout = max(120, int(10 / expected_msg_per_sec) + 60) if expected_msg_per_sec > 0 else 180 | ||
| logger.info(f"Running countersyncd for {timeout} seconds to capture stable measurements") | ||
|
|
||
| result = run_countersyncd_and_capture_output(duthost, timeout=timeout) |
There was a problem hiding this comment.
As we discussed on Teams, on slower poll intervals you also need to increase the reporting interval to get the correct Msg/s
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Ze Gan <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Ze Gan <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Hi @Pterosaur, I see an issue with the tests Msg/s verification
The problem is that the test configures SONiC to poll counters every 10ms (expecting 100 Msg/s), but HFT batches approximately 2339 messages before sending each report. If HFT polls correctly at 10ms intervals but batches 2339 messages per report:
The observed pattern [0.0, 0.0, 233.9, 233.9, 0.0, 0.0] matches this behavior:
The test averages these values (77.97 Msg/s) which doesn't account for the batching cycle. HFT may be polling at the correct 10ms frequency, but the validation logic expects continuous streaming rather than batch-and-send behavior. |
Signed-off-by: Ze Gan <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…et#21482) What is the motivation for this PR? Wrong condition for skipping HFT test How did you do it? Correct it as described in this comment: sonic-net#20379 (comment) Signed-off-by: Ze Gan <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
…et#21482) What is the motivation for this PR? Wrong condition for skipping HFT test How did you do it? Correct it as described in this comment: sonic-net#20379 (comment) Signed-off-by: Ze Gan <[email protected]> Signed-off-by: Andoni Sanguesa <[email protected]>
…et#21482) What is the motivation for this PR? Wrong condition for skipping HFT test How did you do it? Correct it as described in this comment: sonic-net#20379 (comment) Signed-off-by: Ze Gan <[email protected]> Signed-off-by: Andoni Sanguesa <[email protected]>
What is the motivation for this PR? Add tests for high frequency telemetry How did you do it? Add new test cases How did you verify/test it? Check in the sn5600 platform locally Signed-off-by: Lakshmi Yarramaneni <[email protected]>
What is the motivation for this PR? Add tests for high frequency telemetry How did you do it? Add new test cases How did you verify/test it? Check in the sn5600 platform locally
What is the motivation for this PR? Add tests for high frequency telemetry How did you do it? Add new test cases How did you verify/test it? Check in the sn5600 platform locally Signed-off-by: Yael Tzur <[email protected]>
…et#21482) What is the motivation for this PR? Wrong condition for skipping HFT test How did you do it? Correct it as described in this comment: sonic-net#20379 (comment) Signed-off-by: Ze Gan <[email protected]> Signed-off-by: Yael Tzur <[email protected]>
What is the motivation for this PR? Add tests for high frequency telemetry How did you do it? Add new test cases How did you verify/test it? Check in the sn5600 platform locally
…et#21482) What is the motivation for this PR? Wrong condition for skipping HFT test How did you do it? Correct it as described in this comment: sonic-net#20379 (comment) Signed-off-by: Ze Gan <[email protected]>
…et#21482) What is the motivation for this PR? Wrong condition for skipping HFT test How did you do it? Correct it as described in this comment: sonic-net#20379 (comment) Signed-off-by: Ze Gan <[email protected]> Signed-off-by: nnelluri-cisco <[email protected]>
What is the motivation for this PR? Add tests for high frequency telemetry How did you do it? Add new test cases How did you verify/test it? Check in the sn5600 platform locally Signed-off-by: Raghavendran Ramanathan <[email protected]>
…et#21482) What is the motivation for this PR? Wrong condition for skipping HFT test How did you do it? Correct it as described in this comment: sonic-net#20379 (comment) Signed-off-by: Ze Gan <[email protected]> Signed-off-by: Raghavendran Ramanathan <[email protected]>
What is the motivation for this PR? Add tests for high frequency telemetry How did you do it? Add new test cases How did you verify/test it? Check in the sn5600 platform locally Signed-off-by: Raghavendran Ramanathan <[email protected]>
What is the motivation for this PR? Add tests for high frequency telemetry How did you do it? Add new test cases How did you verify/test it? Check in the sn5600 platform locally Signed-off-by: Zhuohui Tan <[email protected]>
…et#21482) What is the motivation for this PR? Wrong condition for skipping HFT test How did you do it? Correct it as described in this comment: sonic-net#20379 (comment) Signed-off-by: Ze Gan <[email protected]> Signed-off-by: Zhuohui Tan <[email protected]>
What is the motivation for this PR? Add tests for high frequency telemetry How did you do it? Add new test cases How did you verify/test it? Check in the sn5600 platform locally Signed-off-by: mssonicbld <[email protected]>
|
Cherry-pick PR to 202511: #22779 |
…et#21482) What is the motivation for this PR? Wrong condition for skipping HFT test How did you do it? Correct it as described in this comment: sonic-net#20379 (comment) Signed-off-by: Ze Gan <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
…#23008) What is the motivation for this PR? Wrong condition for skipping HFT test How did you do it? Correct it as described in this comment: #20379 (comment) Signed-off-by: Ze Gan <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]> Co-authored-by: Ze Gan <[email protected]>
What is the motivation for this PR? Add tests for high frequency telemetry How did you do it? Add new test cases How did you verify/test it? Check in the sn5600 platform locally Signed-off-by: Abhishek <[email protected]>
…et#21482) What is the motivation for this PR? Wrong condition for skipping HFT test How did you do it? Correct it as described in this comment: sonic-net#20379 (comment) Signed-off-by: Ze Gan <[email protected]> Signed-off-by: Abhishek <[email protected]>
What is the motivation for this PR? Add tests for high frequency telemetry How did you do it? Add new test cases How did you verify/test it? Check in the sn5600 platform locally Signed-off-by: mssonicbld <[email protected]> Co-authored-by: Ze Gan <[email protected]>
Description of PR
Summary:
Fixes # (issue)
Type of change
Back port request
Approach
What is the motivation for this PR?
Add tests for high frequency telemetry
How did you do it?
Add new test cases
How did you verify/test it?
Check in the sn5600 platform locally
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation