Skip to content

[tests]: Add tests for high frequency telemetry#20379

Merged
StormLiangMS merged 9 commits intosonic-net:masterfrom
Pterosaur:hft_test
Nov 27, 2025
Merged

[tests]: Add tests for high frequency telemetry#20379
StormLiangMS merged 9 commits intosonic-net:masterfrom
Pterosaur:hft_test

Conversation

@Pterosaur
Copy link
Contributor

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505

Approach

What is the motivation for this PR?

Add tests for high frequency telemetry

How did you do it?

Add new test cases

How did you verify/test it?

Check in the sn5600 platform locally

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Ze Gan <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

pytest_assert(
validation_results['total_counters'] >= min_expected_counters,
f"Expected at least {min_expected_counters} counters, "
f"got {actual_counters}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pterosaur actual_counters was not assigned on this point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by 603e284

counter_values.append(counter_value)
pytest_assert(
counter_value > min_counter_value,
f"Counter value {counter_value} should be greater "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests assumes that there should always be values greater then zero on all counters.
Can we assume that? If (for example) no drop happened on the system, and the tests does not force in any way a drop to happen, we can't guarantee that IF_IN_DISCARDS will not have zero as the correct value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your point, I will fix it.

Comment on lines +233 to +247
def run_countersyncd_and_capture_output(duthost, timeout=120):
"""
Run countersyncd command and capture output.

Args:
duthost: DUT host object
timeout: Timeout in seconds (default: 120)

Returns:
dict: Command result with stdout, stderr, rc
"""
countersyncd_cmd = (
f'timeout {timeout} docker exec swss countersyncd -e '
f'--max-stats-per-report 0 '
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def run_countersyncd_and_capture_output(duthost, timeout=120):
"""
Run countersyncd command and capture output.
Args:
duthost: DUT host object
timeout: Timeout in seconds (default: 120)
Returns:
dict: Command result with stdout, stderr, rc
"""
countersyncd_cmd = (
f'timeout {timeout} docker exec swss countersyncd -e '
f'--max-stats-per-report 0 '
)
def run_countersyncd_and_capture_output(duthost, timeout=120, stats_interval=60):
...
f'--max-stats-per-report 0 '
f'--stats-interval {stats_interval}'
)

I suggest adding the option the override the default stats-interval, from my check this is needed for the longer poll interval tests

timeout = max(120, int(10 / expected_msg_per_sec) + 60) if expected_msg_per_sec > 0 else 180
logger.info(f"Running countersyncd for {timeout} seconds to capture stable measurements")

result = run_countersyncd_and_capture_output(duthost, timeout=timeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed on Teams, on slower poll intervals you also need to increase the reporting interval to get the correct Msg/s

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Ze Gan <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@DavidZagury
Copy link
Contributor

Hi @Pterosaur,
I think we need to rework on how the tests verify the number of MSG per second capture by the tool.

I see an issue with the tests Msg/s verification
For example, this is the failure from one run of test_hft_port_counters

Failed: Average Msg/s 77.97 is outside expected range: 85.00 - 115.00.
Individual values: [0.0, 0.0, 233.9, 233.9, 0.0, 0.0]

The problem is that the test configures SONiC to poll counters every 10ms (expecting 100 Msg/s), but HFT batches approximately 2339 messages before sending each report.

If HFT polls correctly at 10ms intervals but batches 2339 messages per report:

  • Time to accumulate: 2339 messages × 10ms = ~23.39 seconds of polling
  • Transmission burst: Sending 2339 messages over ~10 seconds = 233.9 Msg/s

The observed pattern [0.0, 0.0, 233.9, 233.9, 0.0, 0.0] matches this behavior:

  • Reports during accumulation: 0.0 Msg/s (buffering, not sending)
  • Reports during transmission: 233.9 Msg/s (burst of 2339 messages)

The test averages these values (77.97 Msg/s) which doesn't account for the batching cycle. HFT may be polling at the correct 10ms frequency, but the validation logic expects continuous streaming rather than batch-and-send behavior.

@weiguo-nvidia

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

PriyanshTratiya pushed a commit to PriyanshTratiya/sonic-mgmt that referenced this pull request Jan 21, 2026
…et#21482)

What is the motivation for this PR?
Wrong condition for skipping HFT test

How did you do it?
Correct it as described in this comment: sonic-net#20379 (comment)

Signed-off-by: Ze Gan <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
AndoniSanguesa pushed a commit to AndoniSanguesa/sonic-mgmt that referenced this pull request Jan 21, 2026
…et#21482)

What is the motivation for this PR?
Wrong condition for skipping HFT test

How did you do it?
Correct it as described in this comment: sonic-net#20379 (comment)

Signed-off-by: Ze Gan <[email protected]>
Signed-off-by: Andoni Sanguesa <[email protected]>
AndoniSanguesa pushed a commit to AndoniSanguesa/sonic-mgmt that referenced this pull request Jan 21, 2026
…et#21482)

What is the motivation for this PR?
Wrong condition for skipping HFT test

How did you do it?
Correct it as described in this comment: sonic-net#20379 (comment)

Signed-off-by: Ze Gan <[email protected]>
Signed-off-by: Andoni Sanguesa <[email protected]>
lakshmi-nexthop pushed a commit to lakshmi-nexthop/sonic-mgmt that referenced this pull request Jan 28, 2026
What is the motivation for this PR?
Add tests for high frequency telemetry

How did you do it?
Add new test cases

How did you verify/test it?
Check in the sn5600 platform locally

Signed-off-by: Lakshmi Yarramaneni <[email protected]>
ytzur1 pushed a commit to ytzur1/sonic-mgmt that referenced this pull request Jan 29, 2026
What is the motivation for this PR?
Add tests for high frequency telemetry

How did you do it?
Add new test cases

How did you verify/test it?
Check in the sn5600 platform locally
ytzur1 pushed a commit to ytzur1/sonic-mgmt that referenced this pull request Feb 2, 2026
What is the motivation for this PR?
Add tests for high frequency telemetry

How did you do it?
Add new test cases

How did you verify/test it?
Check in the sn5600 platform locally

Signed-off-by: Yael Tzur <[email protected]>
ytzur1 pushed a commit to ytzur1/sonic-mgmt that referenced this pull request Feb 2, 2026
…et#21482)

What is the motivation for this PR?
Wrong condition for skipping HFT test

How did you do it?
Correct it as described in this comment: sonic-net#20379 (comment)

Signed-off-by: Ze Gan <[email protected]>
Signed-off-by: Yael Tzur <[email protected]>
abhishek-nexthop pushed a commit to nexthop-ai/sonic-mgmt that referenced this pull request Feb 6, 2026
What is the motivation for this PR?
Add tests for high frequency telemetry

How did you do it?
Add new test cases

How did you verify/test it?
Check in the sn5600 platform locally
abhishek-nexthop pushed a commit to nexthop-ai/sonic-mgmt that referenced this pull request Feb 6, 2026
…et#21482)

What is the motivation for this PR?
Wrong condition for skipping HFT test

How did you do it?
Correct it as described in this comment: sonic-net#20379 (comment)

Signed-off-by: Ze Gan <[email protected]>
nnelluri-cisco pushed a commit to nnelluri-cisco/sonic-mgmt that referenced this pull request Feb 12, 2026
…et#21482)

What is the motivation for this PR?
Wrong condition for skipping HFT test

How did you do it?
Correct it as described in this comment: sonic-net#20379 (comment)

Signed-off-by: Ze Gan <[email protected]>
Signed-off-by: nnelluri-cisco <[email protected]>
rraghav-cisco pushed a commit to rraghav-cisco/sonic-mgmt that referenced this pull request Feb 13, 2026
What is the motivation for this PR?
Add tests for high frequency telemetry

How did you do it?
Add new test cases

How did you verify/test it?
Check in the sn5600 platform locally

Signed-off-by: Raghavendran Ramanathan <[email protected]>
rraghav-cisco pushed a commit to rraghav-cisco/sonic-mgmt that referenced this pull request Feb 13, 2026
…et#21482)

What is the motivation for this PR?
Wrong condition for skipping HFT test

How did you do it?
Correct it as described in this comment: sonic-net#20379 (comment)

Signed-off-by: Ze Gan <[email protected]>
Signed-off-by: Raghavendran Ramanathan <[email protected]>
rraghav-cisco pushed a commit to rraghav-cisco/sonic-mgmt that referenced this pull request Feb 18, 2026
What is the motivation for this PR?
Add tests for high frequency telemetry

How did you do it?
Add new test cases

How did you verify/test it?
Check in the sn5600 platform locally

Signed-off-by: Raghavendran Ramanathan <[email protected]>
anilal-amd pushed a commit to anilal-amd/anilal-forked-sonic-mgmt that referenced this pull request Feb 19, 2026
What is the motivation for this PR?
Add tests for high frequency telemetry

How did you do it?
Add new test cases

How did you verify/test it?
Check in the sn5600 platform locally

Signed-off-by: Zhuohui Tan <[email protected]>
anilal-amd pushed a commit to anilal-amd/anilal-forked-sonic-mgmt that referenced this pull request Feb 19, 2026
…et#21482)

What is the motivation for this PR?
Wrong condition for skipping HFT test

How did you do it?
Correct it as described in this comment: sonic-net#20379 (comment)

Signed-off-by: Ze Gan <[email protected]>
Signed-off-by: Zhuohui Tan <[email protected]>
@Pterosaur Pterosaur added Request for 202511 branch Request to backport a change to 202511 branch and removed Cherry Pick Conflict_msft-202412 labels Mar 6, 2026
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Mar 6, 2026
What is the motivation for this PR?
Add tests for high frequency telemetry

How did you do it?
Add new test cases

How did you verify/test it?
Check in the sn5600 platform locally

Signed-off-by: mssonicbld <[email protected]>
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202511: #22779

PriyanshTratiya pushed a commit to PriyanshTratiya/sonic-mgmt that referenced this pull request Mar 16, 2026
…et#21482)

What is the motivation for this PR?
Wrong condition for skipping HFT test

How did you do it?
Correct it as described in this comment: sonic-net#20379 (comment)

Signed-off-by: Ze Gan <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
vmittal-msft pushed a commit that referenced this pull request Mar 17, 2026
…#23008)

What is the motivation for this PR?
Wrong condition for skipping HFT test

How did you do it?
Correct it as described in this comment: #20379 (comment)

Signed-off-by: Ze Gan <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
Co-authored-by: Ze Gan <[email protected]>
abhishek-nexthop pushed a commit to nexthop-ai/sonic-mgmt that referenced this pull request Mar 17, 2026
What is the motivation for this PR?
Add tests for high frequency telemetry

How did you do it?
Add new test cases

How did you verify/test it?
Check in the sn5600 platform locally

Signed-off-by: Abhishek <[email protected]>
abhishek-nexthop pushed a commit to nexthop-ai/sonic-mgmt that referenced this pull request Mar 17, 2026
…et#21482)

What is the motivation for this PR?
Wrong condition for skipping HFT test

How did you do it?
Correct it as described in this comment: sonic-net#20379 (comment)

Signed-off-by: Ze Gan <[email protected]>
Signed-off-by: Abhishek <[email protected]>
mssonicbld added a commit that referenced this pull request Mar 23, 2026
What is the motivation for this PR?
Add tests for high frequency telemetry

How did you do it?
Add new test cases

How did you verify/test it?
Check in the sn5600 platform locally

Signed-off-by: mssonicbld <[email protected]>
Co-authored-by: Ze Gan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants