diff --git a/docs/testplan/transceiver/dom_test_plan.md b/docs/testplan/transceiver/dom_test_plan.md new file mode 100644 index 00000000000..e1f3a700b02 --- /dev/null +++ b/docs/testplan/transceiver/dom_test_plan.md @@ -0,0 +1,200 @@ +# DOM Test Plan For Transceivers + +## Overview + +The DOM Test Plan for transceivers outlines a comprehensive testing strategy for the Digital Optical Monitoring (DOM) functionality within the transceiver module. This document will cover the objectives, scope, test cases, and resources required for effective testing. + +## Scope + +The scope of this test plan includes the following: + +- Validation of DOM data integrity and consistency for transceiver basic DOM content +- Testing of DOM access times and performance + +## Optics Scope + +All the optics covered in the parent [Transceiver Onboarding Test Infrastructure and Framework](test_plan.md#scope) + +## Testbed Topology + +Please refer to the [Testbed Topology](test_plan.md#testbed-topology) + +## Pre-requisites + +Before executing the DOM tests, ensure the following pre-requisites are met: + +### Setup Requirements + +- The testbed is set up according to the [Testbed Topology](test_plan.md#testbed-topology) +- All the pre-requisites mentioned in [Transceiver Onboarding Test Infrastructure and Framework](test_plan.md#test-prerequisites-and-configuration-files) must be met + +### Environment Validation + +Before starting tests, verify the following system conditions: + +1. **System Health Check** + - All critical services are running (xcvrd, pmon, swss, syncd) for at least 5 minutes + - No existing system errors in logs (specific error patterns can be added here) + +2. **Transceiver Baseline Verification** + - All expected transceivers are present and detected + - All links are in operational state + - No existing I2C communication errors + - LLDP neighbors are discovered (if LLDP is enabled) + +3. **Configuration Validation** + - `dom.json` configuration file is properly formatted and accessible + - All required attributes are defined for the transceivers under test + - Platform-specific settings are correctly configured + - DOM monitoring config is enabled for all relevant ports under test + +## Attributes + +A `dom.json` file is used to define the attributes for the DOM tests for the various types of transceivers the system supports. + +**Note on Operational vs. Threshold Ranges:** The DOM test framework uses dual-range validation to provide more nuanced testing. Realistic operational ranges represent the expected values during normal, healthy operation in typical data center environments. These ranges are tighter than the absolute EEPROM threshold ranges and help distinguish between normal operation and edge cases that, while within specification, may indicate environmental stress, aging components, or suboptimal conditions. This approach enables early detection of potential issues before they trigger formal alarms, providing better system health monitoring and preventive maintenance capabilities. + +The following table summarizes the key attributes used in DOM testing. This table serves as the authoritative reference for all attributes and must be updated whenever new attributes are introduced: + +**Legend:** M = Mandatory, O = Optional + +| Attribute Name | Type | Default Value | Mandatory | Override Levels | Description | +|----------------|------|---------------|-----------|-----------------|-------------| +| temperature_operational_range | dict | {"min": 20.0, "max": 70.0} | O | transceivers | Realistic operational temperature range in Celsius during normal operation (typical: room temp to moderate heat) | +| temperature_threshold_range | dict | (format) {"lowalarm": , "lowwarning": , "highwarning": , "highalarm": } | O | transceivers | Absolute threshold temperature range in Celsius (must define all four keys; no implicit defaults) | +| voltage_operational_range | dict | {"min": 3.20, "max": 3.40} | O | transceivers | Realistic operational voltage range in volts during normal operation (typical: 3.3V ±3%) | +| voltage_threshold_range | dict | (format) {"lowalarm": , "lowwarning": , "highwarning": , "highalarm": } | O | transceivers | Absolute threshold voltage range in volts (provide EEPROM alarm/warn limits; skip to disable voltage threshold validation) | +| laser_temperature_operational_range | dict | {"min": 20.0, "max": 70.0} | O | transceivers | Realistic operational laser temperature range in Celsius during normal operation | +| laser_temperature_threshold_range | dict | (format) {"lowalarm": , "lowwarning": , "highwarning": , "highalarm": } | O | transceivers | Absolute threshold laser temperature range in Celsius (specify all four; omit to skip laser temperature threshold checks) | +| txLANE_NUMbias_operational_range | dict | {"min": 50.0, "max": 180.0} | O | transceivers | Realistic operational TX bias current range in mA for lane LANE_NUM during normal operation | +| tx_bias_threshold_range | dict | (format) {"lowalarm": , "lowwarning": , "highwarning": , "highalarm": } | O | transceivers | Absolute threshold TX bias current range in mA (EEPROM limits; skip attribute to disable bias threshold validation) | +| txLANE_NUMpower_operational_range | dict | {"min": -3.0, "max": 3.0} | O | transceivers | Realistic operational TX power range in dBm for lane LANE_NUM during normal operation | +| tx_power_threshold_range | dict | (format) {"lowalarm": , "lowwarning": , "highwarning": , "highalarm": } | O | transceivers | Absolute threshold TX power range in dBm (define all four for TX power threshold validation) | +| rxLANE_NUMpower_operational_range | dict | {"min": -8.0, "max": 2.0} | O | transceivers | Realistic operational RX power range in dBm for lane LANE_NUM during normal operation | +| rx_power_threshold_range | dict | (format) {"lowalarm": , "lowwarning": , "highwarning": , "highalarm": } | O | transceivers | Absolute threshold RX power range in dBm (omit attribute to skip RX power threshold validation) | +| max_update_time_sec | integer | 60 | O | platform | Maximum expected time in seconds between DOM data updates for continuous monitoring validation | +| consistency_check_poll_count | integer | 3 | O | transceivers or platform | Number of polling cycles to perform when validating DOM data consistency and variation patterns | +| shutdown_tx_bias_threshold | float | 0 | O | transceivers | Maximum TX bias current in mA expected when interface is shutdown | +| shutdown_tx_power_threshold | float | -30.0 | O | transceivers | Maximum TX power in dBm expected when interface is shutdown | +| shutdown_rx_power_threshold | float | -30.0 | O | transceivers | Maximum RX power in dBm expected on remote side when interface is shutdown | +| data_max_age_min | integer | 5 | O | platform | Maximum age in minutes for DOM data to be considered fresh (last_update_time validation) | + +## Example `dom.json` File + +The following example demonstrates a complete `dom.json` file focusing on `temperature_threshold_range` for different transceiver types: + +```json +{ + "transceivers": { + "vendors": { + "finisar": { + "part_numbers": { + "FTLX8571D3BCL-10GSFP": { + "temperature_threshold_range": {"lowalarm": -40.0, "lowwarning": -5.0, "highwarning": 75.0, "highalarm": 85.0} + } + } + }, + "mellanox": { + "part_numbers": { + "MCP1600-C003-100G": { + "temperature_threshold_range": {"lowalarm": -40.0, "lowwarning": -10.0, "highwarning": 75.0, "highalarm": 85.0} + }, + "MMA1T00-VS-400G": { + "temperature_threshold_range": {"lowalarm": -30.0, "lowwarning": -10.0, "highwarning": 75.0, "highalarm": 85.0} + } + } + }, + "marvell": { + "part_numbers": { + "88X7120-800G": { + "temperature_threshold_range": {"lowalarm": -30.0, "lowwarning": -10.0, "highwarning": 75.0, "highalarm": 80.0} + } + } + } + } + } +} +``` + +## Dynamic Field Mapping Algorithm + +The DOM test framework uses an attribute-driven approach to dynamically determine which fields to validate based on the configuration present in `dom.json`. This eliminates the need for hardcoded field lists and provides flexible, maintainable test execution. + +### Algorithm Steps + +1. **Attribute Discovery**: Scan `dom.json` for all attributes ending with `_operational_range` or `_threshold_range` + +2. **Base Field Extraction**: Remove the suffix (`_operational_range` or `_threshold_range`) to get the base field name + +3. **Lane Expansion Logic**: + - If the attribute name contains `LANE_NUM` placeholder: Expand for all available lanes (1 to N) by replacing `LANE_NUM` with actual lane numbers + - If no `LANE_NUM` placeholder is present: Expect a single field with the base name + +4. **Special Field Mappings**: Apply any platform-specific field name mappings as needed + +5. **Field Validation**: Validate presence and values of all dynamically determined fields in STATE_DB + +### Example Mappings + +| Attribute Name | Base Field | Lane Expansion | Expected STATE_DB Fields | +|----------------|------------|----------------|-------------------------| +| `temperature_operational_range` | `temperature` | No | `temperature` | +| `txLANE_NUMbias_operational_range` | `txLANE_NUMbias` | Yes | `tx1bias`, `tx2bias`, `tx3bias`, `tx4bias` (for 4-lane) | +| `rxLANE_NUMpower_operational_range` | `rxLANE_NUMpower` | Yes | `rx1power`, `rx2power`, `rx3power`, `rx4power` (for 4-lane) | +| `voltage_threshold_range` | `voltage` | No | `vcchighalarm`, `vcclowalarm`, `vcchighwarning`, `vcclowwarning` | +| `tx_power_threshold_range` | `tx_power` | No | `txpowerhighalarm`, `txpowerlowalarm`, `txpowerhighwarning`, `txpowerlowwarning` | + +This algorithm ensures that test validation is automatically aligned with the configured attributes, providing comprehensive coverage while maintaining flexibility for different transceiver types and platform configurations. + +## CLI Commands Reference + +For detailed CLI commands used in the test cases below, please refer to the [CLI Commands section](test_plan.md#cli-commands) in the Transceiver Onboarding Test Infrastructure and Framework. This section provides comprehensive examples of all relevant commands + +## Test Cases + +**Test Execution Prerequisites:** + +The following tests from the [Transceiver Onboarding Test Infrastructure and Framework](test_plan.md#test-cases-interim) will be run prior to executing the DOM tests: + +- Transceiver presence check +- Ensure active firmware is gold firmware (for non-DAC CMIS transceivers) +- Link up verification +- LLDP verification (if enabled) +- Ensure DOM monitoring is enabled for all relevant ports under test + +**Assumptions for the Below Tests:** + +- All the below tests will be executed for all the transceivers connected to the DUT (the port list is derived from the `port_attributes_dict`) unless specified otherwise. + +### Basic DOM Functionality Tests + +| TC No. | Test | Steps | Expected Results | +|------|------|------|------------------| +| 1 | DOM data availability verification | 1. Access DOM data from `TRANSCEIVER_DOM_SENSOR` table in STATE_DB for each port.
2. Verify `last_update_time` is within `data_max_age_min` minutes of current time to ensure data freshness.
3. Dynamically determine expected DOM fields based on attributes present in `dom.json` using the [Dynamic Field Mapping Algorithm](#dynamic-field-mapping-algorithm).
4. Validate presence of all dynamically determined expected fields in STATE_DB.
5. Skip validation for fields whose corresponding attributes are absent from `dom.json`. | All DOM fields corresponding to configured attributes are present and accessible from STATE_DB. DOM data is successfully retrieved without errors for all attribute-driven fields. Lane-specific fields are automatically expanded for all available lanes (1 to N) based on the `LANE_NUM` placeholder. Field expectations are dynamically derived using the mapping algorithm. Data freshness is confirmed with recent `last_update_time` timestamp. | +| 2 | DOM sensor operational range validation | 1. Retrieve DOM sensor data from STATE_DB.
2. Verify `last_update_time` is within `data_max_age_min` minutes of current time to ensure data freshness.
3. For each attribute ending with `_operational_range` present in `dom.json`, validate the corresponding field(s) in STATE_DB using the [Dynamic Field Mapping Algorithm](#dynamic-field-mapping-algorithm).
4. Check that sensor values fall within the configured operational range.
5. Fail the test case if any values fall outside their respective operational ranges.
6. Log detailed information about any out-of-range values including actual vs expected ranges.
7. Only validate fields derived from attributes present in `dom.json`. | All DOM sensor values fall within their respective operational ranges during normal operation (only for parameters with configured operational range attributes). Test case fails if any sensor values fall outside their configured operational ranges. Data freshness is confirmed before validation. Lane-specific validation automatically performed for all available lanes using the `LANE_NUM` placeholder expansion. Parameter validation is dynamically determined from attribute table. Detailed logging provided for any out-of-range conditions. | +| 3 | DOM threshold validation | 1. Retrieve threshold data from `TRANSCEIVER_DOM_THRESHOLD` table in STATE_DB.
2. Dynamically determine expected threshold fields based on attributes ending with `_threshold_range` present in `dom.json` using the [Dynamic Field Mapping Algorithm](#dynamic-field-mapping-algorithm).
3. For each determined threshold field, validate threshold data completeness by checking for corresponding alarm and warning thresholds (highalarm, lowalarm, highwarning, lowwarning).
4. Compare configured threshold ranges via attributes with thresholds values from DB and validate logical hierarchy (lowalarm < lowwarning < highwarning < highalarm).
5. For parameters that have both operational and threshold range attributes, validate that operational ranges fall within warning thresholds (lowwarning < operational_min and operational_max < highwarning).
6. Only validate threshold fields derived from attributes present in `dom.json`. | All threshold values are present and follow logical hierarchy. EEPROM thresholds align with configured threshold ranges when present. Operational ranges are properly positioned within warning threshold boundaries to ensure appropriate alarm behavior. Threshold data integrity is maintained in STATE_DB. Threshold validation is performed at transceiver level (no lane-specific expansion). Threshold validation is dynamically determined from attribute table. | +| 4 | DOM data consistency verification | 1. Read DOM data `consistency_check_poll_count` times with `max_update_time_sec` intervals between readings.
2. Verify data consistency between readings.
3. Check that `last_update_time` field is being updated correctly with each polling cycle.
4. Validate that sensor readings show expected behavior (e.g., temperature variations within reasonable limits). | DOM data shows consistent and reasonable variations between polling intervals over `consistency_check_poll_count` polling cycles. The `last_update_time` field is properly updated with each polling cycle. No erratic or impossible sensor value changes are observed during the monitoring period. Variation patterns indicate stable DOM monitoring system operation. | + +### Advanced DOM Testing + +| TC No. | Test | Steps | Expected Results | +|------|------|------|------------------| +| 1 | DOM data during interface state changes | 1. Record baseline DOM values with interface in operational state and verify `last_update_time` is within `data_max_age_min` minutes of current time.
2. Identify remote side port from `sonic_{inv_name}_links.csv` for end-to-end validation.
3. Record remote side baseline DOM values including RX power for all lanes and alarm/warning flag states.
4. Issue `config interface shutdown ` and wait for shutdown completion.
5. Validate local DOM data changes for shutdown state:
a. From `TRANSCEIVER_DOM_SENSOR` table:
i. For each available media lane: `tx{lane}bias` should be below `shutdown_tx_bias_threshold`
ii. For each available media lane: `tx{lane}power` should be below `shutdown_tx_power_threshold`
iii. `temperature` and `voltage` should remain within normal ranges
b. From `TRANSCEIVER_STATUS` table:
i. For each available host lane: verify `tx{lane}los_hostlane` flag is set (indicating host lane loss of signal)
c. From corresponding flag metadata tables for `tx{lane}los_hostlane`:
i. For each available host lane: verify flag change count increments
ii. For each available host lane: verify last set time is updated to reflect shutdown event timing
iii. For each available host lane: verify last clear time remains unchanged from baseline
d. From `PORT_TABLE` of APPL_DB: verify `last_update_time` is updated within `last_down_time` for all relevant tables
6. Validate remote side DOM reflects link down condition:
a. From `TRANSCEIVER_DOM_SENSOR` table: for each available lane verify `rx{lane}power` is below `shutdown_rx_power_threshold`
b. From `TRANSCEIVER_DOM_FLAG` table: verify `rxLANE_NUMpowerLAlarm` and `rxLANE_NUMpowerLWarn` flags are set
c. From corresponding flag metadata tables:
i. Verify flag change count increments for low alarm and warning flags
ii. Verify last set time is updated to reflect link down event timing
7. Issue `config interface startup ` and wait for startup completion.
8. Validate local DOM data returns to operational ranges:
a. From `TRANSCEIVER_DOM_SENSOR` table: verify all sensor values return to operational ranges and `last_update_time` is fresh
b. From `TRANSCEIVER_STATUS` table: for each available host lane verify `tx{lane}los_hostlane` flag is cleared
c. From corresponding flag metadata tables:
i. For each available host lane: verify flag change count increments for `tx{lane}los_hostlane`
ii. For each available host lane: verify last clear time is updated to reflect startup event
9. Validate remote side DOM reflects link up condition:
a. From `TRANSCEIVER_DOM_SENSOR` table: verify RX power returns to operational range on remote side for all lanes
b. From `TRANSCEIVER_DOM_FLAG` table: verify `rxLANE_NUMpowerLAlarm` and `rxLANE_NUMpowerLWarn` flags are cleared
c. From corresponding flag metadata tables:
i. Verify flag change count increments for low alarm and warning flags
ii. Verify last clear time is updated to reflect link up event
| DOM values accurately reflect interface operational state on both local and remote sides with proper timing correlation. Shutdown state shows expected TX parameter changes locally (including `tx{lane}los_hostlane` flag set with proper change count and timing) while remote side shows corresponding RX power drop below `shutdown_rx_power_threshold` with appropriate flag management. Startup properly restores all DOM parameters to operational ranges on both sides with flag clearing (local `tx{lane}los_hostlane` cleared with updated change count and clear time). Data freshness is confirmed at each state transition within expected timing windows. End-to-end link health is validated through comprehensive DOM correlation including flag lifecycle management with complete change tracking. Complete bidirectional validation ensures robust link health monitoring. | +| 2 | DOM polling and data freshness validation | 1. Verify DOM polling is currently enabled.
2. Record baseline interface operational state and link flap count.
3. Disable DOM polling: `config interface transceiver dom disable`.
4. Record `last_update_time` from `TRANSCEIVER_DOM_SENSOR` table immediately after disabling to establish baseline.
5. Wait for 2x `max_update_time_sec`.
6. Record `last_update_time` from `TRANSCEIVER_DOM_SENSOR` table after the wait period.
7. Verify interface remains operationally up and link flap count unchanged.
8. Verify that `last_update_time` has not been updated during disabled period (matches baseline value from step 4).
9. Validate that DOM sensor values remain static (no new readings) during disabled period.
10. Enable DOM polling: `config interface transceiver dom enable`.
11. Verify interface remains operationally up and link flap count unchanged during enable operation.
12. Wait for `max_update_time_sec` and verify `last_update_time` is updated and within `data_max_age_min` minutes of current time.
13. Validate that all DOM sensor values are refreshed and within expected operational ranges.
14. Perform consistency check by reading DOM data `consistency_check_poll_count` times to ensure stable polling operation.
15. Verify continuous data freshness by monitoring `last_update_time` updates over multiple polling cycles.
16. Confirm link flap count remains unchanged from baseline throughout the entire DOM polling control test sequence. | DOM polling control works correctly with precise enable/disable functionality without causing interface instability. Disabled polling completely prevents data updates while maintaining data integrity and link stability. Enabled polling resumes data collection within expected intervals with immediate data refresh and no link disruption. Data freshness is properly maintained through the `last_update_time` field with consistent update patterns. All sensor values return to expected ranges after re-enabling with stable polling behavior. Interface remains operationally stable throughout the test with link flap count remaining constant, confirming no flaps occurred during DOM polling state transitions. | + +## Cleanup and Post-Test Verification + +After test completion: + +### Immediate Cleanup + +1. **DOM State Verification**: Ensure DOM monitoring continues to function normally after testing +2. **System Health**: Check system logs for any DOM-related errors or warnings introduced during testing +3. **Service Status**: Verify xcvrd and pmon services are operating normally with DOM polling active + +### Post-Test Report Generation + +1. **Test Summary**: Generate comprehensive test results including pass/fail status for each DOM parameter +2. **Sensor Analysis**: Document any sensor values that approached range limits or showed unusual behavior +3. **Performance Metrics**: Report DOM access times and any performance variations observed +4. **Range Validation**: Summary of all DOM parameters with their actual vs. expected ranges diff --git a/docs/testplan/transceiver/eeprom_test_plan.md b/docs/testplan/transceiver/eeprom_test_plan.md new file mode 100644 index 00000000000..8015e877895 --- /dev/null +++ b/docs/testplan/transceiver/eeprom_test_plan.md @@ -0,0 +1,108 @@ +# Transceiver EEPROM Test Plan + +## Overview + +The Transceiver EEPROM Test Plan outlines the testing strategy for the EEPROM functionality within the transceiver module. This document will cover the objectives, scope, test cases, and resources required for effective testing. + +## Scope + +The scope of this test plan includes the following: + +- Verification of EEPROM read and write operations +- Validation of data integrity and consistency for transceiver basic EEPROM content +- Testing of EEPROM access times and performance + +## Optics Scope + +All the optics covered in the parent [Transceiver Onboarding Test Infrastructure and Framework](test_plan.md#scope) + +## Testbed Topology + +Please refer to the [Testbed Topology](test_plan.md#testbed-topology) + +## Pre-requisites + +Before executing the EEPROM tests, ensure the following pre-requisites are met: + +### Setup Requirements + +- The testbed is set up according to the [Testbed Topology](test_plan.md#testbed-topology) +- All the pre-requisites mentioned in [Transceiver Onboarding Test Infrastructure and Framework](test_plan.md#test-prerequisites-and-configuration-files) must be met + +### Environment Validation + +Before starting tests, verify the following system conditions: + +1. **System Health Check** + - All critical services are running (xcvrd, pmon, swss, syncd) for at least 5 minutes + - No existing system errors in logs + +2. **Configuration Validation** + - `eeprom.json` configuration file is properly formatted and accessible + - All required attributes are defined for the transceivers under test + +## Attributes + +A `eeprom.json` file is used to define the attributes for the EEPROM tests for the various types of transceivers the system supports. + +The following table summarizes the key attributes used in EEPROM testing. This table serves as the authoritative reference for all attributes and must be updated whenever new attributes are introduced: + +**Legend:** M = Mandatory, O = Optional + +| Attribute Name | Type | Default Value | Mandatory | Override Levels | Description | +|----------------|------|---------------|-----------|-----------------|-------------| +| dual_bank_supported | boolean | - | M | transceivers | Whether transceiver supports dual bank firmware | +| vdm_supported | boolean | False | O | transceivers | VDM capability support | +| pm_supported | boolean | False | O | transceivers | Performance Monitoring support | +| cdb_background_mode_supported | boolean | - | O | transceivers | CDB background mode support | +| gold_firmware_version | string | - | O | transceivers | Expected gold/reference firmware version for validation. This also represents the active firmware version. This attribute is applicable only for modules with CMIS CDB firmware. | +| inactive_firmware_version | string | - | O | transceivers | Expected inactive bank firmware version for dual-bank CMIS CDB modules during validation | +| cmis_revision | string | - | O | transceivers | CMIS revision for CMIS based transceivers | +| sff8024_identifier | string | - | M | transceivers | SFF-8024 identifier for the transceiver | +| is_non_dac_and_cmis | boolean | False | O | transceivers | Whether the transceiver is a non-DAC CMIS transceiver | +| breakout_serial_number_pattern | string | - | O | transceivers | Regex pattern to validate serial number format for breakout leaf port transceivers (e.g., ".*-A$", ".*-B$", ".*-C$" for suffix validation, or any other pattern for different placements). Used to validate that leaf-side ports on breakout modules have correctly formatted serial numbers | +| breakout_stem_serial_number_pattern | string | - | O | transceivers | Regex pattern to validate serial number format for breakout stem (main) port transceivers. Typically validates that the serial number does NOT contain leaf suffixes (e.g., "^(?!.*-[A-Z]$).*$" to ensure no suffix like -A, -B, -C). Used to validate that stem-side ports on breakout modules have correctly formatted serial numbers without leaf identifiers | +| eeprom_dump_timeout_sec | integer | 5 | O | transceivers or platform | Default EEPROM dump timeout in seconds | + +## CLI Commands Reference + +For detailed CLI commands used in the test cases below, please refer to the [CLI Commands section](test_plan.md#cli-commands) in the Transceiver Onboarding Test Infrastructure and Framework. This section provides comprehensive examples of all relevant commands + +## Test Cases + +**Assumptions for the Below Tests:** + +- All the below tests will be executed for all the transceivers connected to the DUT (the port list is derived from the `port_attributes_dict`) unless specified otherwise. + +### Generic Test Cases + +| TC No. | Test | Steps | Expected Results | +|------|------|------|------------------| +| 1 | Transceiver presence verification (sfputil) | 1. Use the `sfputil show presence -p ` command to check for transceiver presence.
2. Verify the output for each connected transceiver. | All connected transceivers should be listed as "Present" in the output. | +| 2 | Transceiver presence verification (show CLI) | 1. Use the `show interfaces transceiver presence` CLI to check for transceiver presence.
2. Verify the output for each connected transceiver. | All connected transceivers should be listed as "Present" in the output. | +| 3 | Basic EEPROM content verification via sfputil | 1. Retrieve the BASE_ATTRIBUTES and EEPROM_ATTRIBUTES from `port_attributes_dict`.
2. Use `sfputil show eeprom -p ` to dump EEPROM data.
3. Compare key fields (vendor name, part number, serial number, cmis revision, module hardware revision) with expected values. | 1. All key EEPROM fields match expected values from `port_attributes_dict`.
2. EEPROM dump completes within `eeprom_dump_timeout_sec`. | +| 4 | Basic EEPROM content verification via show CLI | 1. Retrieve the BASE_ATTRIBUTES and EEPROM_ATTRIBUTES from `port_attributes_dict`.
2. Use `show interfaces transceiver info ` CLI.
3. Verify key fields against expected values. | All key EEPROM fields from CLI output match expected values from `port_attributes_dict`. | +| 5 | Firmware version validation | 1. For transceivers with `gold_firmware_version` attribute, use `sfputil show fwversion `.
2. If `dual_bank_supported` is true, verify both active and inactive firmware versions.
3. Compare with expected values from attributes. | Active and inactive firmware versions match corresponding values in attributes dictionary. | +| 6 | EEPROM hexdump CLI verification | 1. Use `sfputil show eeprom-hexdump -p -n 0` to retrieve lower page hexdump.
2. Parse hexdump for vendor name and part number.
3. For non-DAC CMIS transceivers (`is_non_dac_and_cmis` = true), use `sfputil show eeprom-hexdump -p -n 0x11` to dump page 0x11. | 1. Hexdump contains expected vendor name and part number.
2. Non-DAC CMIS transceivers show DPActivated state in page 0x11. | +| 7 | sfputil read-eeprom CLI verification | 1. Use `sfputil read-eeprom -p -n 0 -o 0 -s 1` to retrieve the identifier byte from lower page offset 0 (or use `--wire-addr A0h` for SFF-8472 transceivers). | Retrieved data matches the value of `sff8024_identifier` from `port_attributes_dict`. | +| 8 | Error handling - Missing transceiver | 1. Attempt EEPROM operations on ports without transceivers.
2. Verify error messages.
3. Test both sfputil and show CLI commands. | Commands return appropriate messages indicating transceiver absence. | +| 9 | Serial number pattern validation for breakout ports | 1. Check if `breakout_serial_number_pattern` or `breakout_stem_serial_number_pattern` attribute is defined for the transceiver in `port_attributes_dict`.
2. If neither attribute is defined, skip this test for the port.
3. If `breakout_serial_number_pattern` is defined (leaf port):
a. Use `sfputil show eeprom -p ` to retrieve the serial number.
b. Log the retrieved serial number for debugging purposes.
c. Based on the leaf or stem side, validate that the serial number matches the regex pattern from `breakout_serial_number_pattern` or `breakout_stem_serial_number_pattern` attribute.
| 1. Test is executed only when `breakout_serial_number_pattern` or `breakout_stem_serial_number_pattern` attribute is present.
2. Serial number is successfully retrieved and logged.
3. For leaf ports: Serial number matches the expected regex pattern (e.g., `".*-A$" for leaf A, ".*-B$" for leaf B`)
4. For stem ports: Serial number matches the stem pattern (typically validates absence of leaf suffixes like -A, -B, -C).
5. Test is skipped gracefully for ports without either attribute defined. | +| 10 | Port speed validation in CONFIG_DB | 1. Retrieve the `speed_gbps` attribute from BASE_ATTRIBUTES in `port_attributes_dict` for the port.
2. Query the PORT table in CONFIG_DB to retrieve the configured speed for the port.
3. Convert the CONFIG_DB speed value to Gbps (e.g., "100000" → 100 Gbps, "400000" → 400 Gbps).
4. Compare the converted speed value with the `speed_gbps` attribute.
| 1. CONFIG_DB PORT table contains speed configuration for the port.
2. Speed value from CONFIG_DB matches the `speed_gbps` attribute from BASE_ATTRIBUTES.
3. Any mismatches between configured and expected speed are identified and logged. | +| 11 | FEC configuration validation in CONFIG_DB | 1. Retrieve the `speed_gbps` attribute from BASE_ATTRIBUTES in `port_attributes_dict` for the port.
2. Query the PORT table in CONFIG_DB to retrieve the configured FEC mode for the port.
3. If port speed >= 200 Gbps, verify that FEC is set to `rs`.
| 1. For ports with speed >= 200 Gbps, FEC is configured as RS-FEC.
| + +### CMIS transceiver specific test cases + +| TC No. | Test | Steps | Expected Results | +|------|------|------|------------------| +| 1 | CDB background mode support test | 1. Verify prerequisites: Ensure `is_non_dac_and_cmis` = True and `cdb_background_mode_supported` attribute exists in configuration.
2. Read EEPROM page 1, byte 0x163, bit 5 to determine hardware CDB background mode capability.
3. Validate bit value against expected configuration:
a. If `cdb_background_mode_supported` = True: EEPROM bit 5 should be 1
b. If `cdb_background_mode_supported` = False: EEPROM bit 5 should be 0 | CDB background mode support is accurately confirmed and matches configuration. Hardware capability aligns with configured expectations. Any mismatches between configuration and hardware are identified and logged for analysis. Module capabilities are properly documented. | +| 2 | CDB background mode stress test | 1. For transceivers with `cdb_background_mode_supported` = True, issue API to read CMIS CDB firmware version in a loop for 10 times.
2. Concurrently, keep accessing EEPROM using API and ensure that the kernel has no error logs until the 10th iteration. | CDB background mode operations complete successfully for supported transceivers without I2C errors in kernel logs. | + +## Cleanup and Post-Test Verification + +After test completion: + +1. Verify all transceivers are in original operational state +2. Check system logs for any unexpected errors or kernel messages +3. Verify xcvrd daemon `pid` has not changed (no crashes/restarts) +4. Check for new core files that may indicate crashes +5. Document any failed tests with detailed error information and system state diff --git a/docs/testplan/transceiver/system_test_plan.md b/docs/testplan/transceiver/system_test_plan.md new file mode 100644 index 00000000000..48cab3480d7 --- /dev/null +++ b/docs/testplan/transceiver/system_test_plan.md @@ -0,0 +1,288 @@ +# System Test Plan For Transceivers + +## Overview + +The System Test Plan for transceivers outlines a comprehensive testing strategy for overall system functionality, including link behavior in various scenarios such as process and docker restarts, and advanced transceiver features. This document employs an attribute-driven approach to provide flexible, platform-specific testing that covers traditional transceiver operations as well as modern C-CMIS capabilities. + +## Scope + +The scope of this test plan includes the following: + +- Verification of transceiver system-level functionality and performance across various transceiver types +- Validation of link behavior during system disruptions (process restarts, docker restarts, reboots) +- Testing of transceiver subsystem resilience and recovery mechanisms +- Validation of data consistency across transceiver-related components +- Advanced C-CMIS transceiver testing including frequency and tx power adjustment +- SI (Signal Integrity) settings validation for both optics and media configurations +- Stress testing and load validation under various system conditions +- Platform-specific behavior validation and attribute-driven test configuration + +## Optics Scope + +All the optics covered in the parent [Transceiver Onboarding Test Infrastructure and Framework](test_plan.md#scope) + +## Testbed Topology + +Please refer to the [Testbed Topology](test_plan.md#testbed-topology) + +## Pre-requisites + +Before executing the system tests, ensure the following pre-requisites are met: + +### Setup Requirements + +- The testbed is set up according to the [Testbed Topology](test_plan.md#testbed-topology) +- All the pre-requisites mentioned in [Transceiver Onboarding Test Infrastructure and Framework](test_plan.md#test-prerequisites-and-configuration-files) must be met + +### Environment Validation + +Before starting tests, verify the following system conditions: + +1. **System Health Check** + - All critical services are running (xcvrd, pmon, swss, syncd) + - No existing system errors in logs + +2. **Transceiver Baseline Verification** + - All expected transceivers are present and detected + - All links are in operational state + - No existing I2C communication errors + - LLDP neighbors are discovered (if LLDP is enabled) + +3. **Configuration Validation** + - `system.json` configuration file is properly formatted and accessible + - All required attributes are defined for the transceivers under test + - Platform-specific settings are correctly configured + +## Attributes + +A `system.json` file is used to define the attributes for the system tests for the various types of transceivers the system supports. + +The following table summarizes the key attributes used in system testing. This table serves as the authoritative reference for all attributes and must be updated whenever new attributes are introduced: + +**Legend:** M = Mandatory, O = Optional + +| Attribute Name | Type | Default Value | Mandatory | Override Levels | Description | +|----------------|------|---------------|-----------|-----------------|-------------| +| verify_lldp_on_link_up | boolean | True | O | dut | Whether to verify LLDP functionality when link comes up | +| port_shutdown_wait_sec | integer | 5 | O | transceivers or platform_hwsku_overrides | Wait time after port shutdown before verification | +| port_startup_wait_sec | integer | 60 | O | transceivers or platform_hwsku_overrides | Wait time after port startup before link verification | +| port_toggle_iterations | integer | 100 | O | transceivers or platform_hwsku_overrides | Number of iterations for port toggle stress test | +| port_toggle_delay_sec | integer | 2 | O | transceivers or platform_hwsku_overrides | Delay between port toggle cycles | +| port_range_toggle_iterations | integer | 50 | O | transceivers or platform_hwsku_overrides | Number of iterations for port range toggle stress test | +| port_range_test_ports | list | [] | O | dut | List of specific port names (e.g., 'Ethernet0', 'Ethernet4') to include in port range stress test. Empty list means use all available ports. | +| port_range_startup_wait_sec | integer | 60 | O | transceivers or platform_hwsku_overrides | Wait time after port range startup | +| xcvrd_restart_settle_sec | integer | 120 | O | hwsku | Time to wait after xcvrd restart before checking link status | +| pmon_restart_settle_sec | integer | 120 | O | hwsku | Time to wait after pmon restart before verification | +| swss_restart_settle_sec | integer | 180 | O | hwsku | Time to wait after swss restart before verification | +| syncd_restart_settle_sec | integer | 240 | O | hwsku | Time to wait after syncd restart before verification | +| expect_pmon_restart_with_swss_or_syncd | boolean | False | O | platform | Whether pmon restart is expected during swss/syncd restart | +| config_reload_settle_sec | integer | 300 | O | hwsku | Time to wait after config reload before link status check | +| cold_reboot_settle_sec | integer | 400 | O | hwsku | Time to wait after cold reboot before link status check | +| cold_reboot_iterations | integer | 5 | O | hwsku | Number of iterations for cold reboot stress test | +| warm_reboot_supported | boolean | False | O | platform or hwsku | Whether platform supports warm reboot functionality | +| warm_reboot_settle_sec | integer | 300 | O | hwsku | Time to wait after warm reboot before verification | +| warm_reboot_iterations | integer | 5 | O | hwsku | Number of iterations for warm reboot stress test | +| fast_reboot_supported | boolean | False | O | platform or hwsku | Whether platform supports fast reboot functionality | +| fast_reboot_settle_sec | integer | 300 | O | hwsku | Time to wait after fast reboot before verification | +| fast_reboot_iterations | integer | 5 | O | hwsku | Number of iterations for fast reboot stress test | +| power_cycle_supported | boolean | False | O | platform or hwsku | Whether automated power cycle testing is supported (requires controllable PDU) | +| power_cycle_settle_sec | integer | 600 | O | hwsku | Time to wait after full power restoration before starting verification (allows hardware, optics, and services to fully initialize) | +| power_cycle_iterations | integer | 3 | O | hwsku | Number of power cycle iterations for recovery/stress validation | +| transceiver_reset_supported | boolean | True | O | transceivers | Whether transceiver supports reset functionality | +| transceiver_reset_i2c_recover_sec | integer | 5 | O | transceivers | Time to wait for I2C recovery after transceiver state changes (reset, low power mode) before verification | +| low_power_mode_supported | boolean | False | O | transceivers | Whether transceiver supports low power mode | +| loopback_supported | boolean | False | O | transceivers | Whether transceiver supports loopback functionality | +| supported_loopback_modes | list | [] | O | transceivers | List of supported loopback modes. Possible values include, but are not limited to: ["host-side-input", "media-side-input", "host-side-output", "media-side-output"]. | +| loopback_settle_sec | integer | 15 | O | transceivers | Time to wait after loopback mode changes | +| low_pwr_request_hw_asserted | boolean | True | O | platform | Whether to check DataPath state and LowPwrRequestHW signal. When True, expects LowPwrRequestHW signal to be asserted (1); when False, skips these checks | +| cmis_bootup_low_power_test_supported | boolean | False | O | platform | Whether to test that CMIS transceivers boot up in low power mode when xcvrd is disabled during startup | +| tx_disable_test_supported | boolean | False | O | transceivers | Whether transceiver supports Tx disable testing and DataPath state verification | +| optics_si_settings | dict | {} | O | transceivers | Dictionary containing optics SI settings with nested structure for parameters like OutputAmplitudeTargetRx, OutputEqPreCursorTargetRx, OutputEqPostCursorTargetRx, etc. Each parameter contains per-lane values (e.g., OutputAmplitudeTargetRx1-8). Test runs if dictionary is non-empty. | +| media_si_settings | dict | {} | O | platform_hwsku_overrides | Dictionary containing media SI settings following media_settings.json structure for comparison with APPL_DB values. Test runs if dictionary is non-empty. | +| frequency_values | list | [] | O | transceivers | List of frequency values for C-CMIS transceivers. First value is the default frequency, followed by test frequencies (min/max supported). Test runs if list is non-empty. | +| tx_power_values | list | [] | O | transceivers | List of tx power values in dBm for C-CMIS transceivers. First value is the default tx power, followed by test power levels (min/max supported). Test runs if list is non-empty. | +| expected_application_code | integer | - | O | platform_hwsku_overrides | Expected application code value for the specific transceiver type, platform, and hwsku combination. When defined, the test will verify that the actual application code read from the transceiver matches this expected value. | +| link_stability_monitor_sec | integer | 300 | O | transceivers or platform_hwsku_overrides | Duration in seconds to monitor link stability without link flaps during steady state monitoring test | + +For information about attribute override hierarchy and precedence, please refer to the [Priority-Based Attribute Resolution](test_plan.md#priority-based-attribute-resolution) documentation. + +## CLI Commands Reference + +For detailed CLI commands used in the test cases below, please refer to the [CLI Commands section](test_plan.md#cli-commands) in the Transceiver Onboarding Test Infrastructure and Framework. This section provides comprehensive examples of all relevant commands + +## Test Cases + +**Test Execution Prerequisites:** + +The following tests from the [Transceiver Onboarding Test Infrastructure and Framework](test_plan.md#test-cases-interim) will be run prior to executing the system tests: + +- Transceiver presence check +- Ensure active firmware is gold firmware (for non-DAC CMIS transceivers) +- Link up verification +- LLDP verification (if enabled) + +**Assumptions for the Below Tests:** + +- All the below tests will be executed for all the transceivers connected to the DUT (the port list is derived from the `port_attributes_dict`) unless specified otherwise. + +## Test Execution Flow + +### Recommended Test Order + +The following execution order is recommended to minimize system disruption and ensure reliable test results: + +1. **Link Behavior Test Cases** - Basic port operations that establish baseline functionality +2. **Diagnostic Test Cases** - Non-disruptive validation of transceiver capabilities and SI settings +3. **Configuration Validation Test Cases** - C-CMIS tuning and configuration parameter verification +4. **Transceiver Event Handling Test Cases** - Physical state change validation (requires careful state management) +5. **Process and Service Restart Test Cases** - Medium system disruption tests +6. **System Recovery Test Cases** - High system disruption tests (reboots) +7. **Stress and Load Test Cases** - Extended duration tests (run last to avoid impact on other tests) + +## Test Execution Guidelines + +### Attribute Usage in Tests + +- **Settle Time Attributes**: Used as maximum wait times before declaring test failure +- **Iteration Attributes**: Define the number of test cycles for stress testing +- **Boolean Attributes**: Control conditional test behavior and expectations + +### Test State Management + +- **State Preservation**: Before each test that modifies transceiver settings (e.g., loopback modes, low power mode, Tx disable), the original state should be captured +- **State Reversion**: After each test completion (pass or fail), the transceiver should be reverted to its original operational state +- **Cleanup on Failure**: If a test fails during execution, cleanup procedures should still attempt to restore the original state to prevent impact on subsequent tests +- **Link Recovery**: After state reversion, tests should verify that links return to their expected operational state before proceeding to the next test + +## Common Verification Procedures + +The following procedures are referenced throughout the test cases to ensure consistent validation: + +### Standard Port Recovery and Verification Procedure + +This procedure is used after any test that modifies transceiver state or after system disruptions: + +1. **Link Status Verification** + - Verify port is operationally up + - Wait for configured timeout period before declaring failure + +2. **LLDP Verification** (if `verify_lldp_on_link_up` is True) + - Verify port appears in LLDP neighbor table + - Confirm LLDP neighbor information is correctly populated (remote device ID, port ID, etc. if applicable) + +3. **CMIS State Verification** (for non-DAC CMIS transceivers (can be checked via `is_non_dac_and_cmis` attribute)) + - Verify DataPathState is `DPActivated` for operational ports + - Verify ConfigState is `ConfigSuccess` + +4. **SI Settings Verification** (if applicable) + - **Optics SI Settings**: If `optics_si_settings` is defined, verify current EEPROM values match configured attributes + - **Media SI Settings**: If `media_si_settings` is defined, verify PORT_TABLE APPL_DB values match configured attributes. Also, ensure `NPU_SI_SETTINGS_SYNC_STATUS_KEY` is set to `NPU_SI_SETTINGS_DONE` in `PORT_TABLE` of `APPL_DB` + - Log any discrepancies for analysis + +5. **Application Code Verification** (if `expected_application_code` is defined and not null) + - Read current application code from transceiver EEPROM + - Verify the actual application code matches the `expected_application_code` value + - Log any discrepancies for analysis + +6. **Docker and Process Health Check** + - Verify all critical services (`xcvrd, pmon, swss, syncd`) are running for at least 3 minutes + - Ensure no core files are present in `/var/core` + - Log any service failures for analysis + +### State Preservation and Restoration + +This procedure ensures tests don't interfere with each other: + +1. **State Capture** (before test execution) + - Record current port operational states + +2. **State Restoration** (after test completion, regardless of pass/fail) + - Restore all modified transceiver settings to original values + - Verify all ports return to their original operational states + - Execute **Standard Port Recovery and Verification Procedure** for affected ports + +### Link Behavior Test Cases + +The following tests aim to validate the link status and stability of transceivers under various conditions. + +| TC No. | Test | Steps | Expected Results | +|------|------|------|------------------| +| 1 | Port shutdown validation | 1. For each transceiver port individually:
a. Issue `config interface shutdown `.
b. Wait for `port_shutdown_wait_sec`.
c. Verify port is operationally down.
2. Validate link status using CLI configuration. | Ensure that the link goes down within the configured timeout period for each port. | +| 2 | Port startup validation | 1. For each transceiver port individually:
a. Issue `config interface startup `.
b. Wait for `port_startup_wait_sec`.
2. Execute **Standard Port Recovery and Verification Procedure**. | Ensure that the port passes all verification checks including link status, LLDP, CMIS states, SI settings, and application code validation. | + +### Process and Service Restart Test Cases + +| TC No. | Test | Steps | Expected Results | +|------|------|------|------------------| +| 1 | xcvrd daemon restart impact | 1. Verify current link states to be up for all transceivers and record the link up time.
2. Restart xcvrd daemon.
3. Wait for `xcvrd_restart_settle_sec` before verification.
4. Execute **Standard Port Recovery and Verification Procedure** for all ports. | Confirm `xcvrd` restarts successfully without causing link flaps for the corresponding ports, and all verification checks pass. Also ensure that xcvrd is up for at least `xcvrd_restart_settle_sec` seconds. | +| 2 | xcvrd restart with I2C errors | 1. Verify current link states to be up for all transceivers and record the link up time.
2. Induce I2C errors in the system.
3. Restart xcvrd daemon.
4. Monitor link behavior and system stability.
5. Wait for `xcvrd_restart_settle_sec` before verification.
6. Execute **Standard Port Recovery and Verification Procedure** for all ports. | Confirm `xcvrd` restarts successfully without causing link flaps for the corresponding ports, and all verification checks pass even with I2C errors present. | +| 3 | xcvrd crash recovery test | 1. Verify current link states to be up for all transceivers and record the link up time.
2. Modify xcvrd.py to raise an Exception and induce a crash.
3. Monitor automatic restart behavior.
4. Wait for `xcvrd_restart_settle_sec` before verification.
5. Execute **Standard Port Recovery and Verification Procedure** for all ports. | Confirm `xcvrd` restarts successfully without causing link flaps for the corresponding ports, and all verification checks pass. Also ensure that xcvrd is up for at least `xcvrd_restart_settle_sec` seconds. | +| 4 | pmon docker restart impact | 1. Verify current link states to be up for all transceivers and record the link up time.
2. Restart pmon container.
3. Monitor transceiver monitoring and link behavior.
4. Wait for `pmon_restart_settle_sec` before verification.
5. Execute **Standard Port Recovery and Verification Procedure** for all ports. | Confirm `xcvrd` restarts successfully without causing link flaps for the corresponding ports, and all verification checks pass. | +| 5 | swss docker restart impact | 1. Verify current link states to be up for all transceivers.
2. Restart swss container.
3. Monitor link state transitions and recovery.
4. Wait for `swss_restart_settle_sec` before verification.
5. Check if `expect_pmon_restart_with_swss_or_syncd` is True and verify pmon restart accordingly.
6. Execute **Standard Port Recovery and Verification Procedure** for all ports. | Ensure `xcvrd` restarts (based on `expect_pmon_restart_with_swss_or_syncd`) and all ports pass verification checks. | +| 6 | syncd process restart impact | 1. Verify current link states to be up for all transceivers.
2. Restart syncd.
3. Monitor system recovery and link restoration.
4. Wait for `syncd_restart_settle_sec` before verification.
5. Check if `expect_pmon_restart_with_swss_or_syncd` is True and verify pmon restart accordingly.
6. Execute **Standard Port Recovery and Verification Procedure** for all ports. | Ensure `xcvrd` restarts (based on `expect_pmon_restart_with_swss_or_syncd`) and all ports pass verification checks. | + +### System Recovery Test Cases + +| TC No. | Test | Steps | Expected Results | +|------|------|------|------------------| +| 1 | Config reload impact | 1. Verify current link states to be up for all transceivers.
2. Execute `sudo config reload -y`.
3. Wait for `config_reload_settle_sec` and verify transceiver link restoration.
4. Execute **Standard Port Recovery and Verification Procedure** for all ports. | Ensure `xcvrd` restarts and all ports pass comprehensive verification checks. | +| 2 | Cold reboot link recovery | 1. Verify current link states to be up for all transceivers.
2. Execute a cold reboot.
3. Wait for `cold_reboot_settle_sec` and monitor link recovery after reboot.
4. Execute **Standard Port Recovery and Verification Procedure** for all ports. | Confirm all ports link up again post-reboot and pass comprehensive verification checks. | +| 3 | Warm reboot link recovery | 1. Skip test if `warm_reboot_supported` is False.
2. Verify current link states to be up for all transceivers.
3. Perform warm reboot.
4. Wait for `warm_reboot_settle_sec` and monitor link recovery after reboot.
5. Execute **Standard Port Recovery and Verification Procedure** for all ports. | Ensure `xcvrd` restarts and maintains link stability for all ports, with comprehensive verification checks passing. | +| 4 | Fast reboot link recovery | 1. Skip test if `fast_reboot_supported` is False.
2. Verify current link states to be up for all transceivers.
3. Perform fast reboot.
4. Wait for `fast_reboot_settle_sec` and monitor link establishment timing.
5. Execute **Standard Port Recovery and Verification Procedure** for all ports. | Confirm all ports link up again post-reboot and pass comprehensive verification checks. | +| 5 | Power cycle link recovery | 1. Skip test if `power_cycle_supported` is False.
2. Verify current link states are up for all transceivers.
3. Perform a controlled chassis power cycle.
4. Wait for `power_cycle_settle_sec` and monitor link recovery after full boot.
5. Execute **Standard Port Recovery and Verification Procedure** for all ports. | Confirm all ports link up again post-power cycle and pass comprehensive verification checks (link status, LLDP, CMIS states, SI settings, application code if defined, docker and process stability). | + +### Transceiver Event Handling Test Cases + +| TC No. | Test | Steps | Expected Results | +|------|------|------|------------------| +| 1 | Transceiver reset validation | 1. Skip test if `transceiver_reset_supported` is False.
2. Execute **State Preservation and Restoration** (capture phase).
3. Reset the transceiver using appropriate CLI command.
4. Wait for `transceiver_reset_i2c_recover_sec` to allow I2C recovery.
5. Verify port is linked down after reset and transceiver is in low power mode (if `low_power_mode_supported` is True).
6. If `low_pwr_request_hw_asserted` is True:
a. Check DataPath is in DPDeactivated state.
b. Verify LowPwrAllowRequestHW (page 0h, byte 26.6) is set to 1.
7. Issue `config interface shutdown ` and wait for `port_shutdown_wait_sec`.
8. Issue `config interface startup ` and wait for `port_startup_wait_sec`.
9. Execute **Standard Port Recovery and Verification Procedure**.
10. Execute **State Preservation and Restoration** (restoration phase). | Ensure that the port is linked down after reset and is in low power mode (if transceiver supports it). If `low_pwr_request_hw_asserted` is True, verify DataPath is in DPDeactivated state and LowPwrAllowRequestHW signal is asserted (set to 1). The shutdown and startup commands should re-initialize the port and bring the link up with all verification checks passing. | +| 2 | Transceiver low power mode validation | 1. Skip test if `low_power_mode_supported` is False.
2. Execute **State Preservation and Restoration** (capture phase).
3. Ensure transceiver is in high power mode initially.
4. Put the transceiver in low power mode using CLI command.
5. Wait for `transceiver_reset_i2c_recover_sec`.
6. Verify port is linked down and DataPath is in DPDeactivated state.
7. Verify transceiver is in low power mode through CLI.
8. Disable low power mode (restore to high power mode).
9. Wait for `transceiver_reset_i2c_recover_sec`.
10. Execute **Standard Port Recovery and Verification Procedure**.
11. Execute **State Preservation and Restoration** (restoration phase). | Ensure transceiver transitions correctly between high and low power modes. Port should be down in low power mode and up in high power mode with all verification checks passing. | +| 3 | CMIS transceiver boot-up low power mode test | 1. Skip test if `cmis_bootup_low_power_test_supported` is False.
2. Add `"skip_xcvrd": true,` to the `pmon_daemon_control.json` file.
3. Reboot the device using cold reboot.
4. Wait for `cold_reboot_settle_sec` and verify system is operational.
5. Verify CMIS transceiver is in low power mode after boot-up.
6. Revert the `pmon_daemon_control.json` file to original state.
7. Restart pmon service: `sudo systemctl restart pmon`.
8. Wait for `pmon_restart_settle_sec` and verify normal operation restored.
9. Execute **Standard Port Recovery and Verification Procedure** for all ports. | Ensure CMIS transceiver boots up in low power mode when xcvrd is disabled. System should restore normal operation after reverting configuration and restarting pmon with all verification checks passing. | +| 4 | Transceiver Tx disable DataPath validation | 1. Skip test if `tx_disable_test_supported` is False.
2. Execute **State Preservation and Restoration** (capture phase).
3. Verify transceiver is in operational state with DataPath in DPActivated state.
4. Read MaxDurationDPTxTurnOff value from EEPROM (page 1h, byte 168.7:4) using appropriate API.
5. Disable Tx by writing to EEPROM or calling `tx_disable` API.
6. Monitor DataPath state transition from DPActivated within the MaxDurationDPTxTurnOff time read from EEPROM.
7. Verify DataPath state changes from DPActivated to a different state within the specified time.
8. Issue `config interface shutdown ` and wait for `port_shutdown_wait_sec`.
9. Issue `config interface startup ` and wait for `port_startup_wait_sec`.
10. Execute **Standard Port Recovery and Verification Procedure**.
11. Execute **State Preservation and Restoration** (restoration phase). | Ensure DataPath state transitions correctly within MaxDurationDPTxTurnOff time (read from EEPROM) when Tx is disabled. Port should recover after shutdown/startup cycle with all verification checks passing. This test can be run as a stress test with multiple iterations. | + +### Diagnostic Test Cases + +| TC No. | Test | Steps | Expected Results | +|------|------|------|------------------| +| 1 | Transceiver loopback validation | 1. Skip test if `loopback_supported` is False or `supported_loopback_modes` is empty.
2. Execute **State Preservation and Restoration** (capture phase).
3. For each loopback mode in `supported_loopback_modes`:
a. Enable the loopback mode using CLI command.
b. Wait for `loopback_settle_sec`.
c. Verify loopback is active through CLI.
d. Test data path functionality (use LLDP neighbor verification for host-side input loopback if applicable).
e. Disable loopback mode.
f. Wait for `loopback_settle_sec`.
g. Verify normal operation is restored.
4. Execute **Standard Port Recovery and Verification Procedure**.
5. Execute **State Preservation and Restoration** (restoration phase). | Ensure that the various supported types of loopback work on the transceiver. The LLDP neighbor can also be used to verify the data path after enabling loopback (such as host-side input loopback). All comprehensive verification checks should pass. | +| 2 | CMIS optics SI settings validation | 1. Skip test if `optics_si_settings` is empty or not defined.
2. Ensure the port is linked up.
3. Read optics SI settings from transceiver-level attribute `optics_si_settings` (following optics_si_settings.json structure).
4. Read corresponding SI settings from EEPROM using appropriate API calls.
5. Compare each SI setting parameter between attribute and EEPROM values.
6. Verify all optics SI settings match
7. Log any discrepancies found between attribute and EEPROM values.
8. Execute **Standard Port Recovery and Verification Procedure** (SI settings verification will be included). | Ensure optics SI settings defined in transceiver attributes match the corresponding values read from EEPROM and all comprehensive verification checks pass. | +| 3 | Media SI settings validation | 1. Skip test if `media_si_settings` is empty or not defined.
2. Ensure the port is linked up and `NPU_SI_SETTINGS_SYNC_STATUS_KEY` is set to `NPU_SI_SETTINGS_DONE` in `PORT_TABLE` of `APPL_DB`.
3. Read media SI settings from `media_si_settings` attribute (following media_settings.json structure).
4. Query PORT_TABLE in APPL_DB to retrieve corresponding media SI setting values for the port.
5. Compare each media SI setting parameter between attribute and APPL_DB values.
6. Verify all media SI settings match.
7. Log any discrepancies found between attribute and APPL_DB values.
8. Execute **Standard Port Recovery and Verification Procedure** (media SI settings verification will be included). | Ensure media SI settings defined in platform/hwsku attributes match the corresponding values in PORT_TABLE of APPL_DB and all comprehensive verification checks pass. This validates media configuration consistency for all optics with media SI settings. | + +### Configuration Validation Test Cases + +| TC No. | Test | Steps | Expected Results | +|------|------|------|------------------| +| 1 | C-CMIS frequency adjustment validation | 1. Skip test if `frequency_values` is empty or not defined.
2. Execute **State Preservation and Restoration** (capture phase).
3. Capture current frequency configuration from CONFIG_DB and STATE_DB.
4. For each frequency value in `frequency_values` (starting from index 1, skipping default):
a. Apply frequency using `config interface transceiver frequency `.
b. Wait for `port_startup_wait_sec`.
c. Verify frequency is set correctly in CONFIG_DB and STATE_DB.
d. Execute **Standard Port Recovery and Verification Procedure**.
5. Restore original frequency (first value in `frequency_values`).
6. Wait for `port_startup_wait_sec` and verify restoration.
7. Execute **Standard Port Recovery and Verification Procedure**.
8. Execute **State Preservation and Restoration** (restoration phase). | Ensure C-CMIS transceiver frequency can be adjusted to supported values and restored to original frequency. Port should remain stable throughout frequency changes with all verification checks passing. | +| 2 | C-CMIS tx power adjustment validation | 1. Skip test if `tx_power_values` is empty or not defined.
2. Execute **State Preservation and Restoration** (capture phase).
3. Capture current tx power configuration from CONFIG_DB and STATE_DB.
4. For each tx power value in `tx_power_values` (starting from index 1, skipping default):
a. Apply tx power using `config interface transceiver tx-power `.
b. Wait for `port_startup_wait_sec`.
c. Verify tx power is set correctly in CONFIG_DB and STATE_DB.
d. Execute **Standard Port Recovery and Verification Procedure**.
5. Restore original tx power (first value in `tx_power_values`).
6. Wait for `port_startup_wait_sec` and verify restoration.
7. Execute **Standard Port Recovery and Verification Procedure**.
8. Execute **State Preservation and Restoration** (restoration phase). | Ensure C-CMIS transceiver tx power can be adjusted to supported values and restored to original tx power. Port should remain stable throughout power changes with all verification checks passing. | + +### Stress and Load Test Cases + +| TC No. | Test | Steps | Expected Results | +|------|------|------|------------------| +| 1 | Port startup/shutdown stress test | 1. Execute **State Preservation and Restoration** (capture phase).
2. In a loop, for `port_toggle_iterations` iterations (default 100 times) for 1 random port:
a. Issue `config interface shutdown ` and wait for `port_shutdown_wait_sec`.
b. Issue `config interface startup ` and wait for `port_startup_wait_sec`.
c. Use `port_toggle_delay_sec` delay between cycles.
d. Monitor system stability and link status validation.
3. Execute **Standard Port Recovery and Verification Procedure**.
4. Execute **State Preservation and Restoration** (restoration phase). | Ensure link status toggles to up/down appropriately with each startup/shutdown command. System should remain stable throughout stress testing and all comprehensive verification checks should pass. | +| 2 | Port range stress test | 1. Use ports from `port_range_test_ports` if specified, otherwise use all available transceiver ports.
2. Execute **State Preservation and Restoration** (capture phase).
3. Perform range shut and no-shut operations on the selected ports for `port_range_toggle_iterations` iterations.
4. Wait `port_range_startup_wait_sec` after each startup cycle.
5. Execute **Standard Port Recovery and Verification Procedure** for all tested ports.
6. Execute **State Preservation and Restoration** (restoration phase). | System should handle concurrent port operations without instability and all comprehensive verification checks should pass for all tested ports. | +| 3 | Cold reboot stress test | 1. Execute **State Preservation and Restoration** (capture phase).
2. In a loop, execute cold reboot for `cold_reboot_iterations` consecutive times (default 5, can be configured to 100).
3. Wait `cold_reboot_settle_sec` after each reboot.
4. After each reboot iteration, execute **Standard Port Recovery and Verification Procedure**.
5. Execute **State Preservation and Restoration** (restoration phase). | Confirm the expected ports link up again post-reboot, with all comprehensive verification checks passing for all iterations. System should remain stable throughout multiple reboots. | +| 4 | Warm reboot stress test | 1. Skip test if `warm_reboot_supported` is False.
2. Execute **State Preservation and Restoration** (capture phase).
3. In a loop, execute warm reboot for `warm_reboot_iterations` iterations.
4. Wait `warm_reboot_settle_sec` after each reboot.
5. After each reboot iteration, execute **Standard Port Recovery and Verification Procedure**.
6. Execute **State Preservation and Restoration** (restoration phase). | Ensure all ports link up again post-reboot with all comprehensive verification checks passing for all iterations. System should remain stable throughout multiple reboots. | +| 5 | Fast reboot stress test | 1. Skip test if `fast_reboot_supported` is False.
2. Execute **State Preservation and Restoration** (capture phase).
3. In a loop, execute fast reboot for `fast_reboot_iterations` iterations.
4. Wait `fast_reboot_settle_sec` after each reboot.
5. After each reboot iteration, execute **Standard Port Recovery and Verification Procedure**.
6. Execute **State Preservation and Restoration** (restoration phase). | Ensure all ports link up again post-reboot with all comprehensive verification checks passing for all iterations. System should remain stable throughout multiple reboots. | +| 6 | Link stability monitoring test | 1. Verify all transceivers are in operational state with links up.
2. Record initial `last_up_time` and `flap_count` for each port from interface status.
3. Start monitoring for `link_stability_monitor_sec` duration:
a. Continuously check link status every 10 seconds.
b. Log any link state changes (up to down or down to up).
4. After monitoring period completion, verify that `last_up_time` and `flap_count` remain unchanged for all ports.
5. Execute **Standard Port Recovery and Verification Procedure** for all ports. | All transceivers maintain stable link status throughout the entire monitoring period with no unexpected link flaps. The `last_up_time` and `flap_count` values must remain unchanged, confirming no link instability occurred. This test validates long-term stability under steady-state conditions. | +| 7 | Power cycle stress test | 1. Skip test if `power_cycle_supported` is False.
2. Execute **State Preservation and Restoration** (capture phase).
3. For each iteration (1..`power_cycle_iterations`):
a. Perform controlled power cycle of DUT.
b. Wait for `power_cycle_settle_sec`.
c. Execute **Standard Port Recovery and Verification Procedure** for all ports.
4. Execute **State Preservation and Restoration** (restoration phase). | Confirm the expected ports link up again post-reboot, with all comprehensive verification checks passing for all iterations. System should remain stable throughout multiple reboots. | + +## Cleanup and Post-Test Verification + +After test completion: + +### Immediate Cleanup + +1. **State Restoration**: Verify all transceivers are restored to their original operational state +2. **Link Status**: Verify all transceivers are in operational state with links up +3. **Configuration Reset**: Ensure any temporary configuration changes (e.g., pmon_daemon_control.json modifications) are reverted +4. **System Health**: Check system logs for any unexpected errors or warnings introduced during testing +5. **Service Status**: Verify all services and daemons are running normally +6. **Database Consistency**: Verify state databases contain expected transceiver information and are consistent + +### Post-Test Report Generation + +1. **Test Summary**: Generate comprehensive test results including pass/fail status for each test case +2. **Performance Metrics**: Document settle times, iteration counts, and any performance deviations +3. **Error Analysis**: Compile any errors or warnings encountered during testing with recommended remediation +4. **System State**: Document final system state and any persistent configuration changes diff --git a/docs/testplan/transceiver/test_plan.md b/docs/testplan/transceiver/test_plan.md index fc6329b96ab..6cd8828a498 100644 --- a/docs/testplan/transceiver/test_plan.md +++ b/docs/testplan/transceiver/test_plan.md @@ -1,8 +1,12 @@ -# Transceiver Onboarding Test Plan +# Transceiver Onboarding Test Infrastructure and Framework ## Scope -This test plan outlines a comprehensive framework for ensuring feature parity for new transceivers being onboarded to SONiC. The goal is to automate all tests listed in this document, covering the following areas: +This document defines the attribute-driven test infrastructure and framework for transceiver onboarding validation in SONiC. It specifies configuration file formats, hierarchical attribute resolution, normalization rules, and shared test utilities that underpin all transceiver test categories. + +> **Note:** Scenario-based test cases (shut/noshut sequences, reboot flows, failure injection, etc.) will be documented in a dedicated **Scenario Test Plan** in the future. The [Test Cases (Interim)](#test-cases-interim) section in this document contains test cases that will migrate to that plan. + +The overall test coverage spans the following areas: - **Link Behavior**: Test link behavior using shut/no shut commands and under process crash and device reboot scenarios. - **Transceiver Information Fields**: Verify transceiver specific fields (Vendor name, part number, serial number) via CLI commands, ensuring values match expectations. @@ -85,18 +89,9 @@ A total of 2 ports of a device with the onboarding transceiver should be connect +-----------------+ ``` -## Test Cases - -### 1. Tests not involving traffic - -These tests do not require traffic and are standalone, designed to run on a Device Under Test (DUT) with the transceiver plugged into 2 ports, connected by a cable. - -**Breakout Cable Assumptions for the Below Tests:** - -- All sides of the breakout cable should be connected to the DUT, and each port should be tested individually starting from subport 1 to subport N. The test should be run in reverse order as well i.e. starting from subport N to subport 1. -- For link toggling tests on a subport, it's crucial to ensure that the link status of remaining subports of the breakout port group remains unaffected. +> **Note:** Certain test categories require specific topologies. Each child test plan specifies its required topology in its own pre-requisites section if applicable. -### Test Prerequisites and Configuration Files +## Test Prerequisites and Configuration Files The following configuration files must be present to enable comprehensive transceiver testing. @@ -147,7 +142,7 @@ Prerequisite tests provide early readiness validation before a category's main t - Verify link operational status (link-up state) - Ensure critical system processes (`xcvrd`, `pmon`, `syncd`, `orchagent`) are running -#### 1. DUT Info Files +### DUT Info Files > 📊 **Visual Guide**: See the [File Organization Diagram](diagrams/file_organization.md) for a visual overview of the file structure and relationships. @@ -190,7 +185,7 @@ Example of `dut_info/sonic-device-01.json`: } ``` -##### Normalization Mappings File +#### Normalization Mappings File **File location:** `ansible/files/transceiver/inventory/normalization_mappings.json` @@ -198,8 +193,8 @@ Example of `dut_info/sonic-device-01.json`: **Structure:** -- `vendor_names`: Dictionary mapping raw vendor names to their normalized forms using the normalization rules described in the [CMIS CDB Firmware Binary Management](#141-cmis-cdb-firmware-binary-management) section. -- `part_numbers`: Dictionary mapping raw part numbers to their normalized forms using the normalization rules described in the [CMIS CDB Firmware Binary Management](#141-cmis-cdb-firmware-binary-management) section. +- `vendor_names`: Dictionary mapping raw vendor names to their normalized forms using the normalization rules described in the [CMIS CDB Firmware Binary Management](#121-cmis-cdb-firmware-binary-management) section. +- `part_numbers`: Dictionary mapping raw part numbers to their normalized forms using the normalization rules described in the [CMIS CDB Firmware Binary Management](#121-cmis-cdb-firmware-binary-management) section. Example of `normalization_mappings.json`: @@ -224,7 +219,7 @@ Example of `normalization_mappings.json`: } ``` -##### Per-DUT File Structure +#### Per-DUT File Structure **File location:** `ansible/files/transceiver/inventory/dut_info/.json` @@ -232,7 +227,7 @@ Example of `normalization_mappings.json`: **Discovery:** The framework automatically discovers and loads the appropriate DUT file based on the current testbed's DUT hostname. -##### Per-Port Fields +#### Per-Port Fields **Mandatory Fields:** @@ -248,7 +243,7 @@ Example of `normalization_mappings.json`: - `vendor_rev`: The vendor revision number. - `hardware_rev`: The hardware revision number. -##### Field Handling Rules +#### Field Handling Rules - **Normalized values are derived automatically**: The framework will look up `vendor_name` and `vendor_pn` in the `normalization_mappings.json` file to get the corresponding normalized values. - **Default normalization**: If no mapping is found in `normalization_mappings.json`, the normalized value defaults to the original value. @@ -304,7 +299,7 @@ Example of `normalization_mappings.json`: - Deployment pattern grouping for shared attribute configurations - Clear identification of lane usage and deployment topology -##### Port Specification Formats +#### Port Specification Formats The framework supports multiple flexible port specification formats to reduce configuration overhead: @@ -320,7 +315,7 @@ The framework supports multiple flexible port specification formats to reduce co - Range format follows Python slice convention (start:stop where stop is exclusive) - Step size must be > 0 for range with step format -##### Framework Implementation Requirements +#### Framework Implementation Requirements The test framework must implement the following core components to process per-DUT files and create a comprehensive `port_attributes_dict` dictionary. All parsed data is stored in `port_attributes_dict["EthernetXX"]["BASE_ATTRIBUTES"]` as the foundation for test operations. More details on the `port_attributes_dict` structure and usage are provided in the Test Category Attribute Files section. @@ -335,7 +330,7 @@ More details on the `port_attributes_dict` structure and usage are provided in t - DUT file for current hostname is not found - JSON parsing fails -###### 1. Port Expansion Processing +##### 1. Port Expansion Processing **Purpose**: Handle various port specification formats and expand them into individual port names. @@ -347,7 +342,7 @@ More details on the `port_attributes_dict` structure and usage are provided in t 4. **Deferred Validation**: Validate mandatory fields after all applicable port specifications have been merged 5. **Generate Final Dictionary**: Create the standard per-port attribute dictionary -###### 2. Transceiver Configuration String Parsing +##### 2. Transceiver Configuration String Parsing **Purpose**: Extract all components from the `transceiver_configuration` string during base attributes initialization phase. @@ -392,7 +387,7 @@ More details on the `port_attributes_dict` structure and usage are provided in t } ``` -###### 3. Dictionary Management +##### 3. Dictionary Management **Purpose**: Create and maintain the comprehensive port attributes dictionary that serves as the source of truth for test cases. @@ -541,14 +536,14 @@ Example of a dictionary created by parsing the above file: } ``` -#### 2. Test Category Attribute Files +### Test Category Attribute Files > 🔄 **Process Flow**: See the [Data Flow Architecture Diagram](diagrams/data_flow.md) for a comprehensive view of how these files are processed and merged. Multiple JSON files based on test category define the metadata and test-specific attributes required for each type of transceiver. **Note:** If a test category attribute file is absent, the corresponding test case will be skipped. This allows for selective test execution and gradual framework adoption. -##### File Organization +#### File Organization **Recommended JSON files:** @@ -563,7 +558,7 @@ Multiple JSON files based on test category define the metadata and test-specific **Location:** `ansible/files/transceiver/inventory/attributes/` directory -##### JSON Schema Structure +#### JSON Schema Structure All files follow a consistent schema with these main sections: @@ -616,7 +611,7 @@ All files follow a consistent schema with these main sections: } ``` -##### Schema Components +#### Schema Components **Main Sections:** @@ -640,16 +635,15 @@ All files follow a consistent schema with these main sections: - **No Overlap**: A field should **never** appear in both `mandatory` and `defaults` sections. This creates logical inconsistency because a field cannot simultaneously require explicit specification (mandatory) and have a fallback value (default). The framework would be unable to determine whether to enforce validation or apply defaults when the field is missing. - **Validation First**: The framework should first validate that all mandatory fields can be resolved through the priority hierarchy, then apply defaults for any missing optional fields. -- **Category Isolation**: Each file contains only relevant test domain attributes -- **Deployment Grouping**: Similar deployment patterns share common attributes via `deployment_configurations` - **Category Isolation**: Each category file should only contain attributes relevant to its specific test domain to maintain clear separation of concerns. +- **Deployment Grouping**: Similar deployment patterns share common attributes via `deployment_configurations` - **Backward Compatibility**: Missing optional sections (platform, hwsku, etc.) are silently ignored to support gradual adoption and legacy configurations. -##### Deployment Configurations +#### Deployment Configurations The `deployment_configurations` feature eliminates attribute duplication by defining common attributes once per deployment type instead of repeating across vendors. The framework automatically extracts the DEPLOYMENT component from the `BASE_ATTRIBUTES` field in `port_attributes_dict` to determine which deployment configuration to apply. -##### Priority-Based Attribute Resolution +#### Priority-Based Attribute Resolution Attributes are resolved using this hierarchy (highest to lowest priority): @@ -664,7 +658,7 @@ Attributes are resolved using this hierarchy (highest to lowest priority): > **Note:** For platform+HWSKU combinations in `platform_hwsku_overrides`, the key format is `"+"` where the platform name and HWSKU name are concatenated with a literal `+` symbol. -##### Example Category File +#### Example Category File Example `eeprom.json` file: @@ -703,7 +697,7 @@ Example `eeprom.json` file: } ``` -##### Framework Implementation +#### Framework Implementation The test framework loads and merges attributes from all relevant category files for each transceiver, using hierarchical override rules. This enables category-specific test logic to access only needed attributes while supporting platform, HWSKU, and vendor overrides. @@ -757,7 +751,7 @@ The framework builds a `port_attributes_dict` keyed by logical port name, contai } ``` -##### Attribute Merging Process +#### Attribute Merging Process The framework builds `port_attributes_dict` using this systematic process: @@ -774,7 +768,7 @@ The framework builds `port_attributes_dict` using this systematic process: - Graceful error handling for missing files and invalid JSON - The entire `port_attributes_dict` is captured in the log for debugging -##### Usage +#### Usage Tests access attributes using: `port_attributes_dict[port_name][category_key][attribute_name]` The `port_attributes_dict` is provided directly as a session-scoped fixture and is also initialized early for logging. @@ -797,13 +791,13 @@ def test_example(port_attributes_dict): **Benefits:** Modular design, independent updates per category, conflict prevention, flexible overrides, and performance optimization. -#### 3. Attribute Completeness Validation +### Attribute Completeness Validation > **Process Flow**: See the [Validation Flow Diagram](diagrams/validation_flow.md) for a visual overview of the validation process and pytest integration. Optional post-processing validation ensures comprehensive attribute coverage for transceiver qualification by comparing the populated `port_attributes_dict` against deployment-specific templates. -##### Template Structure +#### Template Structure **Location:** `ansible/files/transceiver/inventory/templates/deployment_templates.json` @@ -827,7 +821,7 @@ Optional post-processing validation ensures comprehensive attribute coverage for } ``` -##### Template Components +#### Template Components - `deployment_templates`: Root object containing all deployment templates - ``: Individual deployment template (e.g., `2x100G_200G_SIDE`) @@ -835,14 +829,14 @@ Optional post-processing validation ensures comprehensive attribute coverage for - `optional_attributes`: Lists of attributes that should be present if available - **Note:** Each category (e.g., `BASE_ATTRIBUTES`, `EEPROM_ATTRIBUTES`, `DOM_ATTRIBUTES`) can have its own set of required and optional attributes. -##### Validation Process +#### Validation Process 1. **Template Selection**: Uses `deployment` field from `BASE_ATTRIBUTES` to select appropriate template 2. **Attribute Comparison**: Compares actual vs required attributes per category 3. **Gap Analysis**: Identifies missing required/optional attributes 4. **Pytest Integration**: Reports results with standard log levels (INFO/WARNING/ERROR/DEBUG) -##### Configuration Control +#### Configuration Control The validation feature can also be controlled via passing a test parameters: @@ -853,7 +847,7 @@ The validation feature can also be controlled via passing a test parameters: **Note:** Even when validation is skipped, all attributes from category files are still loaded and available for test execution. This parameter only affects the post-processing template validation step. -##### Console Output +#### Console Output ```python INFO PASS: Ethernet0 (2x100G_200G_SIDE) - FULLY_COMPLIANT (19/20 attributes) @@ -862,7 +856,7 @@ ERROR FAIL: Ethernet8 - Missing required: DOM_ATTRIBUTES.alarm_flags INFO Overall Compliance: 87.5% (21/24 ports fully compliant) ``` -##### Execution Control +#### Execution Control The validation results determine test execution flow: @@ -872,10 +866,9 @@ The validation results determine test execution flow: **Skipping Validation:** -- Use `--skip_transceiver_template_validation` pytest parameter to completely bypass this validation step -- See the "Configuration Control" section above for detailed usage information +See the [Configuration Control](#configuration-control) section above for pytest parameter details. -#### 4. Transceiver Firmware Info File +### Transceiver Firmware Info File A `transceiver_firmware_info.csv` file (located in `ansible/files/transceiver/inventory` directory) should exist if a transceiver being tested supports CMIS CDB firmware upgrade. This file will capture the firmware binary metadata for the transceiver. Each transceiver should have at least 2 firmware binaries (in addition to the gold firmware binary) so that firmware upgrade can be tested. Following should be the format of the file @@ -889,13 +882,13 @@ normalized_vendor_name,normalized_vendor_pn,fw_version,fw_binary_name,md5sum For each firmware binary, the following metadata should be included: -- `normalized_vendor_name`: The normalized vendor name, created by applying the normalization rules described in the [CMIS CDB Firmware Binary Management](#141-cmis-cdb-firmware-binary-management) section. -- `normalized_vendor_pn`: The normalized vendor part number, created by applying the normalization rules described in the [CMIS CDB Firmware Binary Management](#141-cmis-cdb-firmware-binary-management) section. +- `normalized_vendor_name`: The normalized vendor name, created by applying the normalization rules described in the [CMIS CDB Firmware Binary Management](#121-cmis-cdb-firmware-binary-management) section. +- `normalized_vendor_pn`: The normalized vendor part number, created by applying the normalization rules described in the [CMIS CDB Firmware Binary Management](#121-cmis-cdb-firmware-binary-management) section. - `fw_version`: The version of the firmware. - `fw_binary_name`: The filename of the firmware binary. - `md5sum`: The MD5 checksum of the firmware binary. -#### 5. CMIS CDB Firmware Base URL File +### CMIS CDB Firmware Base URL File A `cmis_cdb_firmware_base_url.csv` file (located in `ansible/files/transceiver/inventory` directory) should be present to define the base URL for downloading CMIS CDB firmware binaries. The file should follow this format: @@ -914,62 +907,49 @@ inv_name,fw_base_url lab,http://1.2.3.4/cmis_cdb_firmware/ ``` -#### 1.1 Link related tests +## Detailed Test Plans -The following tests aim to validate the link status and stability of transceivers under various conditions. +The following child test plans provide comprehensive, attribute-driven test cases for specific test categories. Each plan defines its own attributes, test cases, and validation procedures: -| Step | Goal | Expected Results | -|------|------|------------------| -| Issue CLI command to shutdown a port | Validate link status using CLI configuration | Ensure that the link goes down | -| Issue CLI command to startup a port | Validate link status using CLI configuration | Ensure that the link is up and the port appears in the LLDP table. | -| In a loop, issue startup/shutdown command for a port 100 times | Stress test for link status validation | Ensure link status toggles to up/down appropriately with each startup/shutdown command. Verify ports appear in the LLDP table when the link is up | -| In a loop, issue startup/shutdown command for all ports 100 times | Stress test for link status validation | Ensure link status toggles to up/down appropriately for all relevant ports with each startup/shutdown command. Verify ports appear in the LLDP table when the link is up | -| Restart `xcvrd` | Test link and xcvrd stability | Confirm `xcvrd` restarts successfully without causing link flaps for the corresponding ports, and verify their presence in the LLDP table. Also ensure that xcvrd is up for at least 2 mins | -| Induce I2C errors and restart `xcvrd` | Test link stability in case of `xcvrd` restart + I2C errors | Confirm `xcvrd` restarts successfully without causing link flaps for the corresponding ports, and verify their presence in the LLDP table | -| Modify xcvrd.py to raise an Exception and induce a crash | Test link and xcvrd stability | Confirm `xcvrd` restarts successfully without causing link flaps for the corresponding ports, and verify their presence in the LLDP table. Also ensure that xcvrd is up for at least 2 mins | -| Restart `pmon` | Test link stability | Confirm `xcvrd` restarts successfully without causing link flaps for the corresponding ports, and verify their presence in the LLDP table | -| Restart `swss` | Validate transceiver re-initialization and link status post container restart | Ensure `xcvrd` restarts (for Mellanox platform, ensure pmon restarts) and the expected ports link up again, with port details visible in the LLDP table | -| Restart `syncd` | Validate transceiver re-initialization and link status post container restart | Ensure `xcvrd` restarts (for Mellanox platform, ensure pmon restarts) and the expected ports link up again, with port details visible in the LLDP table | -| Perform a config reload | Test transceiver re-initialization and link status | Ensure `xcvrd` restarts and the expected ports link up again, with port details visible in the LLDP table | -| Execute a cold reboot | Validate transceiver re-initialization and link status post-device reboot | Confirm the expected ports link up again post-reboot, with port details visible in the LLDP table | -| In a loop, execute cold reboot 100 times | Stress test to validate transceiver re-initialization and link status with cold reboot | Confirm the expected ports link up again post-reboot, with port details visible in the LLDP table | -| Execute a warm reboot (if platform supports it) | Test link stability through warm reboot | Ensure `xcvrd` restarts and maintains link stability for the interested ports, with their presence confirmed in the LLDP table | -| Execute a fast reboot (if platform supports it) | Validate transceiver re-initialization and link status post-device reboot | Confirm the expected ports link up again post-reboot, with port details visible in the LLDP table | - -#### 1.2 `sfputil` Command Tests - -The following tests aim to validate various functionalities of the transceiver (transceiver) using the `sfputil` command. +| Test Plan | Description | +|-----------|-------------| +| [EEPROM Test Plan](eeprom_test_plan.md) | EEPROM field validation, firmware version checks, hexdump verification, breakout serial number patterns, port speed and FEC configuration validation | +| [DOM Test Plan](dom_test_plan.md) | Digital Optical Monitoring sensor validation, operational and threshold range checks, data consistency, polling control, and interface state change impact on DOM data | +| [System Test Plan](system_test_plan.md) | System-level transceiver testing including link behavior, process/service restarts, reboot recovery, transceiver event handling (reset, low power mode, loopback), SI settings, C-CMIS tuning, and stress tests | -| Step | Goal | Expected Results | -|------|------|------------------| -| Verify if transceiver presence works with CLI | Transceiver presence validation | Ensure transceiver presence is detected | -| Reset the transceiver followed by issuing shutdown and then startup command | Transceiver reset validation | Ensure that the port is linked down after reset and is in low power mode (if transceiver supports it). Also, ensure that the DataPath is in DPDeactivated state and LowPwrAllowRequestHW (page 0h, byte 26.6) is set to 1. The shutdown and startup commands are later issued to re-initialize the port and bring the link up | -| Put transceiver in low power mode (if transceiver supports it) followed by restoring to high power mode | Transceiver low power mode validation | Ensure transceiver is in high power mode initially. Then put the transceiver in low power mode and ensure that the port is linked down and the DataPath is in DPDeactivated state. Ensure that the port is in low power mode through CLI. Disable low power mode and ensure that the link is up now and transceiver is in high power mode now | -| Verify EEPROM of the transceiver using CLI | Transceiver specific fields validation from EEPROM | Ensure transceiver specific fields are matching with the values retrieved from the transceiver dictionary created using the csv files | -| Verify DOM information of the transceiver using CLI when interface is in shutdown and no shutdown state (if transceiver supports DOM) | Basic DOM validation | Ensure the fields are in line with the expectation based on interface shutdown/no shutdown state | -| Verify EEPROM hexdump of the transceiver using CLI | Transceiver EEPROM hexdump validation | Ensure the output shows Lower Page (0h) and Upper Page (0h) for all 128 bytes on each page. Information from the transceiver dictionary created using the csv files can be used to validate contents of page 0h. Also, ensure that page 11h shows the Data Path state correctly | -| Verify firmware version of the transceiver using CLI (requires disabling DOM config) | Firmware version validation | Ensure the active and inactive firmware version is in line with the expectation from the transceiver dictionary created using the csv files | -| Verify different types of loopback | Transceiver loopback validation | Ensure that the various supported types of loopback work on the transceiver. The LLDP neighbor can also be used to verify the data path after enabling loopback (such as host-side input loopback) | +### Document Relationships + +This repository uses three types of test documentation: -#### 1.3 `sfpshow` Command Tests +- **This document (Infrastructure & Framework)**: Defines configuration file formats, attribute resolution, normalization rules, validation templates, and shared CLI reference. All other test plans reference this document for infrastructure details. +- **Child attribute plans** (EEPROM, DOM, System, VDM, PM, CMIS Firmware upgrade, Transceiver OIR): Define *what* to validate per test category — specific attributes, expected values, and category-scoped test cases. +- **Scenario test plan** (future): Will define *how* to exercise the system — end-to-end test sequences (shut/noshut, reboot, failure injection) that compose and orchestrate tests from the per-category child plans above for scenario-driven validation. + +### 1. Tests not involving traffic + +These tests do not require traffic and are standalone, designed to run on a Device Under Test (DUT) with the transceiver plugged into 2 ports, connected by a cable. + +**Breakout Cable Assumptions for the Below Tests:** + +- All sides of the breakout cable should be connected to the DUT, and each port should be tested individually starting from subport 1 to subport N. The test should be run in reverse order as well i.e. starting from subport N to subport 1. +- For link toggling tests on a subport, it's crucial to ensure that the link status of remaining subports of the breakout port group remains unaffected. + +#### 1.1 `sfpshow` Command Tests The following tests aim to validate various functionalities of the transceiver using the `sfpshow` command. | Step | Goal | Expected Results | |------|------|------------------| -| Verify transceiver specific information through CLI | Validate CLI relying on redis-db | Ensure transceiver specific fields match the values retrieved from transceiver dictionary created using the csv files | -| Verify DOM data is read correctly and is within an acceptable range (if transceiver supports DOM) | Validate CLI relying on redis-db | Ensure DOM data is read correctly and falls within the acceptable range | -| Verify transceiver status when the interface is in shutdown and no shutdown state | Validate CLI relying on redis-db | Ensure the fields align with expectations based on the interface being in shutdown or no shutdown state | | Verify PM information (for C-CMIS transceivers) | Validate CLI relying on redis-db | Ensure that the PM related fields are populated | | Verify VDM information for CMIS cables | Validate CLI relying on redis-db | Ensure that all the Pre-FEC and FERC media and host related VDM related fields are populated. The acceptable values for Pre-FEC fields are from 0 through 1e-4 and the FERC values should be <= 0| | Verify transceiver error-status | Validate CLI relying on redis-db | Ensure the relevant port is in an "OK" state | | Verify transceiver error-status with hardware verification | Validate CLI relying on transceiver hardware | Ensure the relevant port is in an "OK" state | -#### 1.4 CMIS CDB Firmware Upgrade Testing +#### 1.2 CMIS CDB Firmware Upgrade Testing -##### 1.4.1 CMIS CDB Firmware Binary Management +##### 1.2.1 CMIS CDB Firmware Binary Management -###### 1.4.1.1 Firmware Binary Naming Guidelines +###### 1.2.1.1 Firmware Binary Naming Guidelines CMIS CDB firmware binaries must follow strict naming conventions to ensure compatibility across different filesystems and automation tools. @@ -984,7 +964,7 @@ CMIS CDB firmware binaries must follow strict naming conventions to ensure compa 2. **File Extension:** - Use `.bin` extension -###### 1.4.1.2 Normalization Rules for Vendor Name and Part Number +###### 1.2.1.2 Normalization Rules for Vendor Name and Part Number To ensure compatibility and uniqueness across filesystems and automation tools, the following normalization rules should be applied to vendor names and part numbers: @@ -1054,7 +1034,7 @@ def normalize_vendor_field(field: str) -> str: return field.upper() ``` -###### 1.4.1.3 Firmware Binary Storage on SONiC Device +###### 1.2.1.3 Firmware Binary Storage on SONiC Device The CMIS CDB firmware binaries are stored under `/tmp/cmis_cdb_firmware/` on the SONiC device, organized by normalized vendor name and part number. @@ -1071,7 +1051,7 @@ The CMIS CDB firmware binaries are stored under `/tmp/cmis_cdb_firmware/` on the **Requirements:** -- All directory and file names **must be uppercase** and follow the normalization rules defined in section 1.4.1.2 +- All directory and file names **must be uppercase** and follow the [Normalization Rules for Vendor Name and Part Number](#1212-normalization-rules-for-vendor-name-and-part-number) - Use the `GENERIC_N_END` placeholder for cable lengths as described in the normalization rules **Example Directory Structure:** @@ -1088,7 +1068,7 @@ The CMIS CDB firmware binaries are stored under `/tmp/cmis_cdb_firmware/` on the └── ... ``` -###### 1.4.1.4 Firmware Binary Storage on Remote Server +###### 1.2.1.4 Firmware Binary Storage on Remote Server The CMIS CDB firmware binaries must be stored on a remote server with the following requirements: @@ -1118,7 +1098,7 @@ Firmware binaries are accessed using the following URL pattern: http://firmware-server.example.com/cmis_cdb_firmware/ACMECORP/QSFP-100G-AOC-GENERIC_2_ENDM/ACMECORP_QSFP-100G-AOC-GENERIC_2_ENDM_1.2.4.bin ``` -##### 1.4.2 CMIS CDB Firmware Copy to DUT via sonic-mgmt infrastructure +##### 1.2.2 CMIS CDB Firmware Copy to DUT via sonic-mgmt infrastructure This section describes the automated process for copying firmware binaries to the DUT, ensuring only the required firmware versions are present for testing. @@ -1144,7 +1124,7 @@ To ensure only the necessary firmware binaries are present for each transceiver: - The firmware binary folder on the DUT (`/tmp/cmis_cdb_firmware/`) will be deleted after the test module run is complete to ensure a clean state for subsequent tests - Cleanup includes removing both the directory structure and any temporary files created during the process -##### 1.4.3 CMIS CDB Firmware Upgrade Tests +##### 1.2.3 CMIS CDB Firmware Upgrade Tests **Prerequisites:** @@ -1167,7 +1147,7 @@ To ensure only the necessary firmware binaries are present for each transceiver: |6 | Firmware download validation post reset | 1. Perform steps in TC #1
2. Execute `sfputil reset PORT` and wait for it to finish | All the expectation of test case #1 must be met | |7 | Ensure static fields of EEPROM remain unchanged | 1. Perform steps in TC #1
2. Perform steps in TC #2 | 1. All the expectations of TC #1 and #2 must be met
2. Ensure after each step 1 and 2 that the static fields of EEPROM (e.g., vendor name, part number, serial number, vendor date code, OUI, and hardware revision) remain unchanged | -#### 1.5 Remote Reseat related tests +#### 1.3 Remote Reseat related tests The following tests aim to validate the functionality of remote reseating of the transceiver module. All the below steps should be executed in a sequential manner. @@ -1182,25 +1162,16 @@ All the below steps should be executed in a sequential manner. |6 | Issue CLI command to startup the port | Remote reseat validation | Ensure that the port is linked up and is seen in the LLDP table | |7 | Issue CLI command to enable DOM monitoring for the port | Remote reseat validation | Ensure that the DOM monitoring is enabled for the port | -#### 1.6 Transceiver Specific Capabilities +#### 1.4 Transceiver Specific Capabilities -##### 1.6.1 General Tests +##### 1.4.1 General Tests | Step | Goal | Expected Results | |------|------|------------------| -| Add `"skip_xcvrd": true,` to the `pmon_daemon_control.json` file and reboot the device | Ensure CMIS transceiver is in low power mode upon boot-up | Ensure the transceiver is in low power mode after device reboot. Revert back the file to original after verification | -| Disable the Tx by directly writing to the EEPROM/or by calling `tx_disable` API | Ensure Tx is disabled within the advertised time for CMIS transceivers | Ensure that the DataPath state changes from DPActivated to a different state within the MaxDurationDPTxTurnOff time (page 1h, byte 168.7:4). Issue shut/no shutdown command to restore the link. This can be a stress test | | Adjust FEC mode | Validate FEC mode adjustment for transceivers supporting FEC | Ensure that the FEC mode can be adjusted to different modes and revert to original FEC mode after testing | | Validate FEC stats counters | Validate FEC stats counters | Ensure that FEC correctable, uncorrectable and symbol errors have integer values | -##### 1.6.2 C-CMIS specific tests - -| Step | Goal | Expected Results | -|------|------|------------------| -| Adjust frequency | Validate frequency adjustment for C-CMIS transceivers | Ensure that the frequency can be adjusted to minimum and maximum supported frequency and revert to original frequency after testing | -| Adjust tx power | Validate tx power adjustment for C-CMIS transceivers | Ensure that the tx power can be adjusted to minimum and maximum supported power and revert to original tx power after testing | - -##### 1.6.3 VDM specific tests +##### 1.4.2 VDM specific tests **Prerequisites:** @@ -1214,7 +1185,7 @@ All the below steps should be executed in a sequential manner. |3 | VDM freeze and unfreeze when 1 or more lanes have Tx disabled | 1. Shutdown the first lane of the physical port
2. Repeat the steps of TC #1
3. Repeat the steps of TC #2
4. Increase the number of lanes shutdown by 1 until all 8 lanes are disabled | 1. For step 2, follow the expectations of TC #1
2. For step 3, follow the expectations of TC #2 | |4| VDM freeze and unfreeze with non sequential lanes Tx disabled | 1. Shutdown all the odd-numbered lanes of the physical port
2. Repeat the steps of TC #1
3. Repeat the steps of TC #2
4. Startup all the odd-numbered lanes and shutdown all the even-numbered lanes of the physical port and repeat step #2 and #3 | 1. For step 2, follow the expectations of TC #1
2. For step 3, follow the expectations of TC #2 | -#### CLI commands +## CLI Commands **Note** @@ -1362,6 +1333,33 @@ Dump EEPROM of the transceiver sudo sfputil show eeprom -p ``` +Read/Write specific offset from/to EEPROM + +``` +sudo sfputil read-eeprom --help + -p, --port Logical port name [required] + -n, --page EEPROM page number in hex [required] + -o, --offset EEPROM offset within the page [required] + -s, --size Size of byte to be read [required] + --no-format Display non formatted data + --wire-addr TEXT Wire address of sff8472 + --help Show this message and exit. + +sudo sfputil write-eeprom --help +Usage: sfputil write-eeprom [OPTIONS] + + Write SFP EEPROM data + +Options: + -p, --port Logical port name [required] + -n, --page EEPROM page number in hex [required] + -o, --offset EEPROM offset within the page [required] + -d, --data Hex string EEPROM data [required] + --wire-addr TEXT Wire address of sff8472 + --verify Verify the data by reading back + --help Show this message and exit. +``` + Dump EEPROM DOM information of the transceiver and verify fields based on the below information ```