Skip to content

[action] [PR:632] Catch the xcvrd exception returned by get_transceiver_info#636

Merged
mssonicbld merged 1 commit intosonic-net:202505from
mssonicbld:cherry/202505/632
Jul 1, 2025
Merged

[action] [PR:632] Catch the xcvrd exception returned by get_transceiver_info#636
mssonicbld merged 1 commit intosonic-net:202505from
mssonicbld:cherry/202505/632

Conversation

@mssonicbld
Copy link
Copy Markdown
Collaborator

Description

Catch the xcvrd exception returned by get_transceiver_info

Motivation and Context

xcvrd is repeatedly crashing if get_transceiver_info returns an exception when there's an EEPROM read failure.
The solution is to catch the exception and return None so that the port can be shown as Not Ready.

How Has This Been Tested?

Generated an exception inside the xcvrd for a particular port and checked the xcvrd process status. It wasn't crashing anymore.

# In the _wrapper_get_transceiver_info function of xcvrd
        try:
            with open("/root/test", "r") as fd:
                port = fd.readlines()[0].strip()
                helper_logger.log_error(f"port: {port}")
                helper_logger.log_error(f"pport: {physical_port}")
                if int(port) == physical_port:
                    helper_logger.log_error(f"exception pport: {physical_port}")
                    raise(Exception(f"pport {physical_port} not available."))
            return platform_chassis.get_sfp(physical_port).get_transceiver_info()

Here's the traceback in the log as a result when /root/test file has 11:

2025 Jun 27 17:21:57.148944 str4-sn5600-2 ERR pmon#xcvrd[137982]: port: 11
2025 Jun 27 17:21:57.148944 str4-sn5600-2 ERR pmon#xcvrd[137982]: pport: 11
2025 Jun 27 17:21:57.148973 str4-sn5600-2 ERR pmon#xcvrd[137982]: exception pport: 11
2025 Jun 27 17:21:57.149007 str4-sn5600-2 ERR pmon#xcvrd[137982]: Failed to get transceiver info for physical port 11. Exception: pport 11 not available.
2025 Jun 27 17:21:57.149212 str4-sn5600-2 ERR pmon#xcvrd[137982]: Traceback (most recent call last):
2025 Jun 27 17:21:57.149219 str4-sn5600-2 ERR pmon#xcvrd[137982]:   File "/usr/local/lib/python3.11/dist-packages/xcvrd/xcvrd.py", line 257, in _wrapper_get_transceiver_info
2025 Jun 27 17:21:57.149252 str4-sn5600-2 ERR pmon#xcvrd[137982]:     raise(Exception(f"pport {physical_port} not available."))
2025 Jun 27 17:21:57.149272 str4-sn5600-2 ERR pmon#xcvrd[137982]: Exception: pport 11 not available.

xcvrd doesn't crash as a result of this exception.

Additional Information (Optional)

<!-- Provide a general summary of your changes in the Title above -->

#### Description
Catch the xcvrd exception returned by get_transceiver_info
<!--
     Describe your changes in detail
-->

#### Motivation and Context
xcvrd is repeatedly crashing if get_transceiver_info returns an exception when there's an EEPROM read failure.
The solution is to catch the exception and return None so that the port can be shown as Not Ready.
<!--
     Why is this change required? What problem does it solve?
     If this pull request closes/resolves an open Issue, make sure you
     include the text "fixes #xxxx", "closes #xxxx" or "resolves #xxxx" here
-->

#### How Has This Been Tested?
Generated an exception inside the xcvrd for a particular port and checked the xcvrd process status. It wasn't crashing anymore.
```python
# In the _wrapper_get_transceiver_info function of xcvrd
        try:
            with open("/root/test", "r") as fd:
                port = fd.readlines()[0].strip()
                helper_logger.log_error(f"port: {port}")
                helper_logger.log_error(f"pport: {physical_port}")
                if int(port) == physical_port:
                    helper_logger.log_error(f"exception pport: {physical_port}")
                    raise(Exception(f"pport {physical_port} not available."))
            return platform_chassis.get_sfp(physical_port).get_transceiver_info()
```

Here's the traceback in the log as a result when `/root/test` file has `11`:
```
2025 Jun 27 17:21:57.148944 str4-sn5600-2 ERR pmon#xcvrd[137982]: port: 11
2025 Jun 27 17:21:57.148944 str4-sn5600-2 ERR pmon#xcvrd[137982]: pport: 11
2025 Jun 27 17:21:57.148973 str4-sn5600-2 ERR pmon#xcvrd[137982]: exception pport: 11
2025 Jun 27 17:21:57.149007 str4-sn5600-2 ERR pmon#xcvrd[137982]: Failed to get transceiver info for physical port 11. Exception: pport 11 not available.
2025 Jun 27 17:21:57.149212 str4-sn5600-2 ERR pmon#xcvrd[137982]: Traceback (most recent call last):
2025 Jun 27 17:21:57.149219 str4-sn5600-2 ERR pmon#xcvrd[137982]:   File "/usr/local/lib/python3.11/dist-packages/xcvrd/xcvrd.py", line 257, in _wrapper_get_transceiver_info
2025 Jun 27 17:21:57.149252 str4-sn5600-2 ERR pmon#xcvrd[137982]:     raise(Exception(f"pport {physical_port} not available."))
2025 Jun 27 17:21:57.149272 str4-sn5600-2 ERR pmon#xcvrd[137982]: Exception: pport 11 not available.
```

`xcvrd` doesn't crash as a result of this exception.

<!--
     Please describe in detail how you tested your changes.
     Include details of your testing environment, and the tests you ran to
     see how your change affects other areas of the code, etc.
-->

#### Additional Information (Optional)
@mssonicbld
Copy link
Copy Markdown
Collaborator Author

Original PR: #632

@mssonicbld
Copy link
Copy Markdown
Collaborator Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld mssonicbld merged commit 421f3c1 into sonic-net:202505 Jul 1, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant