[vslib] Fix VS SAI reporting 0xFFFFFFFF oper speed for virtio NICs#1763
[vslib] Fix VS SAI reporting 0xFFFFFFFF oper speed for virtio NICs#1763rustiqly wants to merge 1 commit intosonic-net:masterfrom
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Companion PR: sonic-net/sonic-buildimage#25428 (enables |
|
@rustiqly , can you also add unit test for this PR? |
70620e1 to
1580a71
Compare
|
Added 3 unit tests in
Also building a VS image with both fixes to verify end-to-end on KVM. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
1580a71 to
400fff5
Compare
|
/azp run |
|
@lguohan Fixed — the CI failure was |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
Fixes incorrect reporting of SAI_PORT_ATTR_OPER_SPEED in VS when sysfs returns “unknown” speed (e.g., -1 on virtio), avoiding the 0xFFFFFFFF wraparound and using configured port speed as a fallback.
Changes:
- Update
vs_get_oper_speed()to read sysfs speed as signed and reject invalid values (<= 0) with a warning. - Update
refresh_port_oper_speed()to fall back toSAI_PORT_ATTR_SPEEDwhen operational speed can’t be read. - Add unit tests intended to cover configured-speed and fallback behavior; update spellcheck word list.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| vslib/SwitchStateBaseHostif.cpp | Reads sysfs speed as int32_t and rejects invalid/unknown values before assigning to uint32_t. |
| vslib/SwitchStateBase.cpp | Falls back to configured port speed when operational speed read fails. |
| unittest/vslib/TestSwitchBCM56850.cpp | Adds tests for oper-speed refresh scenarios (but currently calls a protected method directly). |
| tests/aspell.en.pws | Adds new words used by comments/logs/tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
400fff5 to
9ce11d3
Compare
|
@lguohan Found it — the failing test was |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Commenter does not have sufficient privileges for PR 1763 in repo sonic-net/sonic-sairedis |
|
@lguohan The CI failure is All our tests passed: Triggered a rerun. |
9ce11d3 to
1b69b8e
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Thanks for the review @yejianquan! Good observation about the fallback behavior. Yes, reporting
The key fix is replacing the Regarding lguohan's request — I still need to build a VS image and provide |
rustiqly
left a comment
There was a problem hiding this comment.
Good catch on the description vs code discrepancy — I'll update the PR description to be more precise.
The behavior is intentional: when m_useConfiguredSpeedAsOperSpeed is false (default), we deliberately report 0 rather than the configured speed. The reasoning:
- 0 = "unknown/unavailable" — this is semantically correct when we can't read the actual oper speed from sysfs (e.g. virtio NICs that report -1)
- Configured speed ≠ operational speed — reporting configured speed as oper_speed when we don't actually know the real value would be misleading. A port could be configured for 100G but negotiated to 25G.
- The
m_useConfiguredSpeedAsOperSpeedflag exists precisely for users who want the fallback-to-configured behavior (e.g. platforms where sysfs is always unavailable but configured == actual).
So the two paths are:
- Flag true → assume configured speed is oper speed (legacy/simple platforms)
- Flag false (default) → read sysfs, report 0 if unavailable (honest reporting)
I'll clarify this in the PR description now.
4a946f4 to
699f972
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
CI failures are redis socket errors in vstest harness ("Unable to connect to redis (unix-socket)") — known infra flakiness unrelated to this change. All p4rt tests passed. Re-triggering. /azp run |
699f972 to
d189ce6
Compare
|
/azp run |
|
@lguohan Rebased to latest master (d189ce6). The
Could you take another look? Thanks! |
|
Azure Pipelines successfully started running 1 pipeline(s). |
VS Image Build + Test ResultsBuilt a VS image with this PR commit (d189ce6) on latest Build: Bug trigger confirmed: sysfs reports sai.profile active (companion PR #25428 merged):
No error logs: 0 matches for 3 ports oper=up, all showing correct 40G configured speed. |
d189ce6 to
094830d
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
When running on KVM/virtio, /sys/class/net/ethN/speed returns -1 (unknown). vs_get_oper_speed() reads this into uint32_t, which wraps to 4294967295 (0xFFFFFFFF) and gets reported as the oper speed. Fix: - Read sysfs speed as int32_t and check for <= 0 (invalid) - When invalid, log a warning, set speed=0, and return false - In refresh_port_oper_speed(), set attr.value.u32=0 (unknown) instead of returning SAI_STATUS_FAILURE - When m_useConfiguredSpeedAsOperSpeed is true, configured speed is used as oper speed (bypassing sysfs entirely) This ensures VS ports show 0 (unknown) rather than garbage values on virtual NICs. With the companion sai.profile change (#25428), VS ports will show the correct configured speed. Added unit tests: - test_refresh_port_oper_speed_configured_speed: verifies oper speed equals configured speed when m_useConfiguredSpeedAsOperSpeed=true - test_refresh_port_oper_speed_down_port: verifies oper speed is 0 for operationally down ports - test_refresh_port_oper_speed_fallback_no_tap: verifies fallback when vs_get_oper_speed fails (no TAP device) Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com>
094830d to
7c239ff
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
Fix for BuildAsan FlexCounter.bulkChunksize failurePushed an incremental commit ( Root CauseUnder Asan, the test thread runs 2-5x slower while the FlexCounter worker thread polls at its normal 1000ms interval. This creates a race:
Result: FixAdded an
This eliminates the timing dependency entirely — no sleeps, no increased timeouts, just explicit synchronization. |
|
/azp run |
1e3b4ea to
7c239ff
Compare
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
What I did
[agent]
When running SONiC VS on KVM/virtio,
/sys/class/net/ethN/speedreturns-1(unknown speed).vs_get_oper_speed()reads this into auint32_t, which wraps to4294967295(0xFFFFFFFF) and gets reported asSAI_PORT_ATTR_OPER_SPEED. This causesshow interfaces statusto display4294967.3Gas the port speed for operationally up ports.How I did it
vs_get_oper_speed(): Read sysfs speed asint32_tinstead of directly intouint32_t. Check for<= 0(invalid) and returnfalsewith a warning log.refresh_port_oper_speed(): Whenvs_get_oper_speed()fails, fall back toSAI_PORT_ATTR_SPEED(configured speed from CONFIG_DB) instead of returningSAI_STATUS_FAILURE.How to verify it
show interfaces statusshows4294967.3Gfor Ethernet0/4/840G)Or verify directly:
Previous command output (if the output of a command-Loss, currentError etc has changed)
Before:
4294967.3GAfter:
40GSigned-off-by: Rustiqly rustiqly@users.noreply.github.com