Skip to content

Catch the xcvrd exception returned by get_transceiver_info#632

Merged
prgeor merged 3 commits intosonic-net:masterfrom
az-pz:ariz/catch-xcvrd-exception-when-eeprom-read-is-unsuccessful
Jun 30, 2025
Merged

Catch the xcvrd exception returned by get_transceiver_info#632
prgeor merged 3 commits intosonic-net:masterfrom
az-pz:ariz/catch-xcvrd-exception-when-eeprom-read-is-unsuccessful

Conversation

@az-pz
Copy link
Copy Markdown
Contributor

@az-pz az-pz commented Jun 26, 2025

Description

Catch the xcvrd exception returned by get_transceiver_info

Motivation and Context

xcvrd is repeatedly crashing if get_transceiver_info returns an exception when there's an EEPROM read failure.
The solution is to catch the exception and return None so that the port can be shown as Not Ready.

How Has This Been Tested?

Generated an exception inside the xcvrd for a particular port and checked the xcvrd process status. It wasn't crashing anymore.

# In the _wrapper_get_transceiver_info function of xcvrd
        try:
            with open("/root/test", "r") as fd:
                port = fd.readlines()[0].strip()
                helper_logger.log_error(f"port: {port}")
                helper_logger.log_error(f"pport: {physical_port}")
                if int(port) == physical_port:
                    helper_logger.log_error(f"exception pport: {physical_port}")
                    raise(Exception(f"pport {physical_port} not available."))
            return platform_chassis.get_sfp(physical_port).get_transceiver_info()

Here's the traceback in the log as a result when /root/test file has 11:

2025 Jun 27 17:21:57.148944 str4-sn5600-2 ERR pmon#xcvrd[137982]: port: 11
2025 Jun 27 17:21:57.148944 str4-sn5600-2 ERR pmon#xcvrd[137982]: pport: 11
2025 Jun 27 17:21:57.148973 str4-sn5600-2 ERR pmon#xcvrd[137982]: exception pport: 11
2025 Jun 27 17:21:57.149007 str4-sn5600-2 ERR pmon#xcvrd[137982]: Failed to get transceiver info for physical port 11. Exception: pport 11 not available.
2025 Jun 27 17:21:57.149212 str4-sn5600-2 ERR pmon#xcvrd[137982]: Traceback (most recent call last):
2025 Jun 27 17:21:57.149219 str4-sn5600-2 ERR pmon#xcvrd[137982]:   File "/usr/local/lib/python3.11/dist-packages/xcvrd/xcvrd.py", line 257, in _wrapper_get_transceiver_info
2025 Jun 27 17:21:57.149252 str4-sn5600-2 ERR pmon#xcvrd[137982]:     raise(Exception(f"pport {physical_port} not available."))
2025 Jun 27 17:21:57.149272 str4-sn5600-2 ERR pmon#xcvrd[137982]: Exception: pport 11 not available.

xcvrd doesn't crash as a result of this exception.

Additional Information (Optional)

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mihirpat1 mihirpat1 requested a review from Copilot June 26, 2025 19:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the transceiver info retrieval logic to catch exceptions from get_transceiver_info in order to prevent the xcvrd process from crashing on EEPROM read failures.

  • Added an exception handler to log errors and traceback before returning None
  • Updated tests to simulate an exception scenario during transceiver info retrieval

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
sonic-xcvrd/xcvrd/xcvrd.py Added exception handling and logging in get_transceiver_info wrapper
sonic-xcvrd/tests/test_xcvrd.py Updated test cases to simulate exception scenarios

except NotImplementedError:
pass
except Exception:
helper_logger.log_error("Failed to get transceiver info for physical port {}".format(physical_port))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@az-pz Can you print it out exception in the error message?

Copy link
Copy Markdown
Contributor

@mihirpat1 mihirpat1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please attach the traceback as observed in the logs during the test.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@az-pz
Copy link
Copy Markdown
Contributor Author

az-pz commented Jun 27, 2025

@mihirpat1 , Addressed your comments.

@mssonicbld
Copy link
Copy Markdown
Collaborator

Cherry-pick PR to 202505: #636

@cyw233
Copy link
Copy Markdown

cyw233 commented Jul 1, 2025

Hey @az-pz, looks like there's a conflict with the msft-202405 branch, could you please manually resolve it? Thanks!

@mssonicbld
Copy link
Copy Markdown
Collaborator

Cherry-pick PR to msft-202412: Azure/sonic-platform-daemons.msft#25

@az-pz
Copy link
Copy Markdown
Contributor Author

az-pz commented Jul 2, 2025

@cyw233 , here's the manual cherry pick for 202405: Azure/sonic-platform-daemons.msft#26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants