-
Notifications
You must be signed in to change notification settings - Fork 1k
Add DOM post-test deviation attributes, telemetry profiling TC #23356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 4 commits
99e8d2d
605fadd
1743d52
0b81f16
bce0545
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -78,6 +78,15 @@ The following table summarizes the key attributes used in DOM testing. This tabl | |
| | shutdown_tx_power_threshold | float | -30.0 | O | transceivers | Maximum TX power in dBm expected when interface is shutdown | | ||
| | shutdown_rx_power_threshold | float | -30.0 | O | transceivers | Maximum RX power in dBm expected on remote side when interface is shutdown | | ||
| | data_max_age_min | integer | 5 | O | platform | Maximum age in minutes for DOM data to be considered fresh (last_update_time validation) | | ||
| | voltage_deviation_range | dict | - | O | transceivers | Acceptable post-test range for `voltage` in volts. Format: `{"min": <float>, "max": <float>}` — the post-test reading must satisfy `min <= value <= max`. Omit to skip this post-test check. | | ||
| | laser_temperature_deviation_range | dict | - | O | transceivers | Acceptable post-test range for `laser_temperature` in Celsius. Format: `{"min": <float>, "max": <float>}`. Omit to skip this post-test check. | | ||
| | txLANE_NUMbias_deviation_range | dict | - | O | transceivers | Acceptable post-test range for `tx{lane}bias` in mA, validated per lane. Format: `{"min": <float>, "max": <float>}`. Omit to skip this per-lane post-test check. | | ||
| | txLANE_NUMpower_deviation_range | dict | - | O | transceivers | Acceptable post-test range for `tx{lane}power` in dBm, validated per lane. Format: `{"min": <float>, "max": <float>}`. Omit to skip this per-lane post-test check. | | ||
| | rxLANE_NUMpower_deviation_range | dict | - | O | transceivers | Acceptable post-test range for `rx{lane}power` in dBm, validated per lane. Format: `{"min": <float>, "max": <float>}`. Omit to skip this per-lane post-test check. | | ||
| | telemetry_profile_poll_interval_sec | integer | 10 | O | transceivers or platform_hwsku_overrides | Polling interval in seconds for the telemetry update profiling test | | ||
| | telemetry_profile_duration_min | integer | 10 | O | transceivers or platform_hwsku_overrides | Duration in minutes to run the telemetry update profiling test | | ||
|
|
||
| **Post-test range rule:** For tests that restore a port to steady-state operation, verify each post-test DOM reading falls within its configured `{"min", "max"}` range. The check applies only to attributes that are present in the configuration (`min <= post-test value <= max`). Lane-based entries such as TX bias and TX/RX power use the `LANE_NUM` expansion and are validated per lane. The test fails if any enabled field falls outside its configured range. | ||
|
|
||
| ## Example `dom.json` File | ||
|
|
||
|
|
@@ -100,7 +109,12 @@ The following example demonstrates a complete `dom.json` file focusing on `tempe | |
| "temperature_threshold_range": {"lowalarm": -40.0, "lowwarning": -10.0, "highwarning": 75.0, "highalarm": 85.0} | ||
| }, | ||
| "MMA1T00-VS-400G": { | ||
| "temperature_threshold_range": {"lowalarm": -30.0, "lowwarning": -10.0, "highwarning": 75.0, "highalarm": 85.0} | ||
| "temperature_threshold_range": {"lowalarm": -30.0, "lowwarning": -10.0, "highwarning": 75.0, "highalarm": 85.0}, | ||
| "voltage_deviation_range": {"min": 3.25, "max": 3.45}, | ||
|
||
| "laser_temperature_deviation_range": {"min": 20.0, "max": 70.0}, | ||
| "txLANE_NUMbias_deviation_range": {"min": 50.0, "max": 180.0}, | ||
| "txLANE_NUMpower_deviation_range": {"min": -3.0, "max": 3.0}, | ||
| "rxLANE_NUMpower_deviation_range": {"min": -8.0, "max": 2.0} | ||
| } | ||
| } | ||
| }, | ||
|
|
@@ -162,6 +176,8 @@ The following tests from the [Transceiver Onboarding Test Infrastructure and Fra | |
| - LLDP verification (if enabled) | ||
| - Ensure DOM monitoring is enabled for all relevant ports under test | ||
|
|
||
| > **Note:** Each prerequisite check is itself a test case. If a prerequisite test case fails, the dependent DOM test case will also be declared as failed. | ||
|
|
||
| **Assumptions for the Below Tests:** | ||
|
|
||
| - All the below tests will be executed for all the transceivers connected to the DUT (the port list is derived from the `port_attributes_dict`) unless specified otherwise. | ||
|
|
@@ -179,8 +195,9 @@ The following tests from the [Transceiver Onboarding Test Infrastructure and Fra | |
|
|
||
| | TC No. | Test | Steps | Expected Results | | ||
| |------|------|------|------------------| | ||
| | 1 | DOM data during interface state changes | 1. Record baseline DOM values with interface in operational state and verify `last_update_time` is within `data_max_age_min` minutes of current time.<br>2. Identify remote side port from `sonic_{inv_name}_links.csv` for end-to-end validation.<br>3. Record remote side baseline DOM values including RX power for all lanes and alarm/warning flag states.<br>4. Issue `config interface shutdown <port>` and wait for shutdown completion.<br>5. Validate local DOM data changes for shutdown state:<br> a. From `TRANSCEIVER_DOM_SENSOR` table:<br> i. For each available media lane: `tx{lane}bias` should be below `shutdown_tx_bias_threshold`<br> ii. For each available media lane: `tx{lane}power` should be below `shutdown_tx_power_threshold`<br> iii. `temperature` and `voltage` should remain within normal ranges<br> b. From `TRANSCEIVER_STATUS` table:<br> i. For each available host lane: verify `tx{lane}los_hostlane` flag is set (indicating host lane loss of signal)<br> c. From corresponding flag metadata tables for `tx{lane}los_hostlane`:<br> i. For each available host lane: verify flag change count increments<br> ii. For each available host lane: verify last set time is updated to reflect shutdown event timing<br> iii. For each available host lane: verify last clear time remains unchanged from baseline<br> d. From `PORT_TABLE` of APPL_DB: verify `last_update_time` is updated within `last_down_time` for all relevant tables<br>6. Validate remote side DOM reflects link down condition:<br> a. From `TRANSCEIVER_DOM_SENSOR` table: for each available lane verify `rx{lane}power` is below `shutdown_rx_power_threshold`<br> b. From `TRANSCEIVER_DOM_FLAG` table: verify `rxLANE_NUMpowerLAlarm` and `rxLANE_NUMpowerLWarn` flags are set<br> c. From corresponding flag metadata tables:<br> i. Verify flag change count increments for low alarm and warning flags<br> ii. Verify last set time is updated to reflect link down event timing<br>7. Issue `config interface startup <port>` and wait for startup completion.<br>8. Validate local DOM data returns to operational ranges:<br> a. From `TRANSCEIVER_DOM_SENSOR` table: verify all sensor values return to operational ranges and `last_update_time` is fresh<br> b. From `TRANSCEIVER_STATUS` table: for each available host lane verify `tx{lane}los_hostlane` flag is cleared<br> c. From corresponding flag metadata tables:<br> i. For each available host lane: verify flag change count increments for `tx{lane}los_hostlane`<br> ii. For each available host lane: verify last clear time is updated to reflect startup event<br>9. Validate remote side DOM reflects link up condition:<br> a. From `TRANSCEIVER_DOM_SENSOR` table: verify RX power returns to operational range on remote side for all lanes<br> b. From `TRANSCEIVER_DOM_FLAG` table: verify `rxLANE_NUMpowerLAlarm` and `rxLANE_NUMpowerLWarn` flags are cleared<br> c. From corresponding flag metadata tables:<br> i. Verify flag change count increments for low alarm and warning flags<br> ii. Verify last clear time is updated to reflect link up event<br> | DOM values accurately reflect interface operational state on both local and remote sides with proper timing correlation. Shutdown state shows expected TX parameter changes locally (including `tx{lane}los_hostlane` flag set with proper change count and timing) while remote side shows corresponding RX power drop below `shutdown_rx_power_threshold` with appropriate flag management. Startup properly restores all DOM parameters to operational ranges on both sides with flag clearing (local `tx{lane}los_hostlane` cleared with updated change count and clear time). Data freshness is confirmed at each state transition within expected timing windows. End-to-end link health is validated through comprehensive DOM correlation including flag lifecycle management with complete change tracking. Complete bidirectional validation ensures robust link health monitoring. | | ||
| | 2 | DOM polling and data freshness validation | 1. Verify DOM polling is currently enabled.<br>2. Record baseline interface operational state and link flap count.<br>3. Disable DOM polling: `config interface transceiver dom <port> disable`.<br>4. Record `last_update_time` from `TRANSCEIVER_DOM_SENSOR` table immediately after disabling to establish baseline.<br>5. Wait for 2x `max_update_time_sec`.<br>6. Record `last_update_time` from `TRANSCEIVER_DOM_SENSOR` table after the wait period.<br>7. Verify interface remains operationally up and link flap count unchanged.<br>8. Verify that `last_update_time` has not been updated during disabled period (matches baseline value from step 4).<br>9. Validate that DOM sensor values remain static (no new readings) during disabled period.<br>10. Enable DOM polling: `config interface transceiver dom <port> enable`.<br>11. Verify interface remains operationally up and link flap count unchanged during enable operation.<br>12. Wait for `max_update_time_sec` and verify `last_update_time` is updated and within `data_max_age_min` minutes of current time.<br>13. Validate that all DOM sensor values are refreshed and within expected operational ranges.<br>14. Perform consistency check by reading DOM data `consistency_check_poll_count` times to ensure stable polling operation.<br>15. Verify continuous data freshness by monitoring `last_update_time` updates over multiple polling cycles.<br>16. Confirm link flap count remains unchanged from baseline throughout the entire DOM polling control test sequence. | DOM polling control works correctly with precise enable/disable functionality without causing interface instability. Disabled polling completely prevents data updates while maintaining data integrity and link stability. Enabled polling resumes data collection within expected intervals with immediate data refresh and no link disruption. Data freshness is properly maintained through the `last_update_time` field with consistent update patterns. All sensor values return to expected ranges after re-enabling with stable polling behavior. Interface remains operationally stable throughout the test with link flap count remaining constant, confirming no flaps occurred during DOM polling state transitions. | | ||
| | 1 | DOM data during interface state changes | 1. Record baseline DOM values with interface in operational state and verify `last_update_time` is within `data_max_age_min` minutes of current time.<br>2. Identify remote side port from `sonic_{inv_name}_links.csv` for end-to-end validation.<br>3. Record remote side baseline DOM values including RX power for all lanes and alarm/warning flag states.<br>4. Issue `config interface shutdown <port>` and wait for shutdown completion.<br>5. Validate local DOM data changes for shutdown state:<br> a. From `TRANSCEIVER_DOM_SENSOR` table:<br> i. For each available media lane: `tx{lane}bias` should be below `shutdown_tx_bias_threshold`<br> ii. For each available media lane: `tx{lane}power` should be below `shutdown_tx_power_threshold`<br> iii. `temperature` and `voltage` should remain within normal ranges<br> b. From `TRANSCEIVER_STATUS` table:<br> i. For each available host lane: verify `tx{lane}los_hostlane` flag is set (indicating host lane loss of signal)<br> c. From corresponding flag metadata tables for `tx{lane}los_hostlane`:<br> i. For each available host lane: verify flag change count increments<br> ii. For each available host lane: verify last set time is updated to reflect shutdown event timing<br> iii. For each available host lane: verify last clear time remains unchanged from baseline<br> d. From `PORT_TABLE` of APPL_DB: verify `last_update_time` is updated within `last_down_time` for all relevant tables<br>6. Validate remote side DOM reflects link down condition:<br> a. From `TRANSCEIVER_DOM_SENSOR` table: for each available lane verify `rx{lane}power` is below `shutdown_rx_power_threshold`<br> b. From `TRANSCEIVER_DOM_FLAG` table: verify `rxLANE_NUMpowerLAlarm` and `rxLANE_NUMpowerLWarn` flags are set<br> c. From corresponding flag metadata tables:<br> i. Verify flag change count increments for low alarm and warning flags<br> ii. Verify last set time is updated to reflect link down event timing<br>7. Issue `config interface startup <port>` and wait for startup completion.<br>8. Validate local DOM data returns to operational ranges:<br> a. From `TRANSCEIVER_DOM_SENSOR` table: verify all sensor values return to operational ranges and `last_update_time` is fresh<br> b. If any of `voltage_deviation_range`, `laser_temperature_deviation_range`, `txLANE_NUMbias_deviation_range`, or `txLANE_NUMpower_deviation_range` are defined, verify post-startup DOM values fall within their configured min/max range<br> c. From `TRANSCEIVER_STATUS` table: for each available host lane verify `tx{lane}los_hostlane` flag is cleared<br> d. From corresponding flag metadata tables:<br> i. For each available host lane: verify flag change count increments for `tx{lane}los_hostlane`<br> ii. For each available host lane: verify last clear time is updated to reflect startup event<br>9. Validate remote side DOM reflects link up condition:<br> a. From `TRANSCEIVER_DOM_SENSOR` table: verify RX power returns to operational range on remote side for all lanes<br> b. If `rxLANE_NUMpower_deviation_range` is defined, verify remote-side post-startup RX power falls within the configured min/max range<br> c. From `TRANSCEIVER_DOM_FLAG` table: verify `rxLANE_NUMpowerLAlarm` and `rxLANE_NUMpowerLWarn` flags are cleared<br> d. From corresponding flag metadata tables:<br> i. Verify flag change count increments for low alarm and warning flags<br> ii. Verify last clear time is updated to reflect link up event<br> | DOM values accurately reflect interface operational state on both local and remote sides with proper timing correlation. Shutdown state shows expected TX parameter changes locally (including `tx{lane}los_hostlane` flag set with proper change count and timing) while remote side shows corresponding RX power drop below `shutdown_rx_power_threshold` with appropriate flag management. Startup properly restores all DOM parameters to operational ranges on both sides with flag clearing (local `tx{lane}los_hostlane` cleared with updated change count and clear time). When any post-test range attribute is configured, post-test values stay within the configured min/max range for all enabled DOM fields. Data freshness is confirmed at each state transition within expected timing windows. End-to-end link health is validated through comprehensive DOM correlation including flag lifecycle management with complete change tracking. Complete bidirectional validation ensures robust link health monitoring. | | ||
| | 2 | DOM polling and data freshness validation | 1. Verify DOM polling is currently enabled.<br>2. Record baseline interface operational state and link flap count.<br>3. Disable DOM polling: `config interface transceiver dom <port> disable`.<br>4. Record `last_update_time` from `TRANSCEIVER_DOM_SENSOR` table immediately after disabling to establish baseline.<br>5. Wait for 2x `max_update_time_sec`.<br>6. Record `last_update_time` from `TRANSCEIVER_DOM_SENSOR` table after the wait period.<br>7. Verify interface remains operationally up and link flap count unchanged.<br>8. Verify that `last_update_time` has not been updated during disabled period (matches baseline value from step 4).<br>9. Validate that DOM sensor values remain static (no new readings) during disabled period.<br>10. Enable DOM polling: `config interface transceiver dom <port> enable`.<br>11. Verify interface remains operationally up and link flap count unchanged during enable operation.<br>12. Wait for `max_update_time_sec` and verify `last_update_time` is updated and within `data_max_age_min` minutes of current time.<br>13. Validate that all DOM sensor values are refreshed and within expected operational ranges.<br>14. If any post-test range attributes are defined, verify the refreshed DOM sensor values fall within their configured min/max range.<br>15. Perform consistency check by reading DOM data `consistency_check_poll_count` times to ensure stable polling operation.<br>16. Verify continuous data freshness by monitoring `last_update_time` updates over multiple polling cycles.<br>17. Confirm link flap count remains unchanged from baseline throughout the entire DOM polling control test sequence. | DOM polling control works correctly with precise enable/disable functionality without causing interface instability. Disabled polling completely prevents data updates while maintaining data integrity and link stability. Enabled polling resumes data collection within expected intervals with immediate data refresh and no link disruption. Data freshness is properly maintained through the `last_update_time` field with consistent update patterns. All sensor values return to expected ranges after re-enabling with stable polling behavior. When any post-test range attribute is configured, post-test values remain within the configured min/max range for all enabled DOM fields. Interface remains operationally stable throughout the test with link flap count remaining constant, confirming no flaps occurred during DOM polling state transitions. | | ||
| | 3 | Telemetry update interval profiling | 1. Verify DOM polling is enabled and port is operationally up.<br>2. Record initial `last_update_time` from `TRANSCEIVER_DOM_SENSOR` table.<br>3. Poll `last_update_time` every `telemetry_profile_poll_interval_sec` seconds for `telemetry_profile_duration_min` minutes.<br>4. On each poll, record the current `last_update_time` value and calculate the delta from the previous distinct `last_update_time` (skip consecutive polls where the timestamp has not changed).<br>5. After the profiling period, compute statistics from the collected update interval deltas: minimum, maximum, mean and median.<br>6. Verify every observed `last_update_time` remained within `data_max_age_min`.<br>7. Log the full statistics profile and per-port summary for cross-release comparison. | All `last_update_time` values remain within `data_max_age_min` throughout the profiling window. The logged statistics (min, max, mean, median) provide a quantitative baseline for detecting polling regressions between image releases. No update gaps exceed `data_max_age_min`. | | ||
|
|
||
| ## Cleanup and Post-Test Verification | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to clarify that the post-check is looking for relative changes based on the first (or average) measurement. Something like min <= post-test value - first-read value <= max
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bfoo-msft Addressed this now