Commit 548f962
Orchagent SAI error handling improvements (sonic-net#3587)
What I did
This change aims to reduce self-induced orchagent exit when any SAI API call fails (i.e returns anything other than SAI_STATUS_SUCCESS). This change is the first set of changes that does the following:
handleSaiCreateStatus() / handleSaiSetStatus() changes
Return 'task_success' for SAI_STATUS_ITEM_ALREADY_EXISTS, SAI_STATUS_ITEM_NOT_FOUND, SAI_STATUS_ADDR_NOT_FOUND and SAI_STATUS_OBJECT_IN_USE irrespective of the object type.
Return 'task_need_retry' for SAI_STATUS_INSUFFICIENT_RESOURCES, SAI_STATUS_TABLE_FULL, SAI_STATUS_NO_MEMORY and
SAI_STATUS_NV_STORAGE_FULL.
Call handleSaiFailure() and return 'task_failed' for other SAI errors. This will log a structured syslog via eventd and also take a SAI dump.
handleSaiRemoveStatus() changes
Return 'task_success' for SAI_STATUS_ITEM_ALREADY_EXISTS, SAI_STATUS_ITEM_NOT_FOUND, SAI_STATUS_ADDR_NOT_FOUND, SAI_STATUS_INSUFFICIENT_RESOURCES, SAI_STATUS_TABLE_FULL, SAI_STATUS_NO_MEMORY, SAI_STATUS_NV_STORAGE_FULL
Return 'task_need_retry' for SAI_STATUS_OBJECT_IN_USE
Call handleSaiFailure() and return 'task_failed' for other SAI errors. This will log a structured syslog via eventd and also take a SAI dump.
handleSaiGetStatus() changes
Log a NOTICE message and return task_failed. This is similar to what is being done today for GET calls.
handleSaiFailure() changes
Update handleSaiFailure() to take 3 arguments - namely the SAI API, operation type string and SAI API return status. This will be used in crafting a structured syslog error message when the failure happens. All callers of this function are updated accordingly.
Mock test changes
Added new tests for coverage and updated existing tests that do "ASSERT_DEATH" assertions when SAI API calls fail.
Fixed errors in existing portsorch_ut test cases
What is not done
There are changes needed to all orchs to handle scenarios where orchagent doesn't crash anymore when SAI API calls fail. There are also places in different orchs where an explicit exception is thrown in case of SAI errors. These and the remaining items in sonic-net/SONiC#1698 will be handled in phase-2.
Why I did it
Crashing orchagent on every SAI error is an overkill. Instead, we follow the approach that is called out in the HLD above to handle these errors in a more graceful manner.1 parent 6fb64cd commit 548f962
8 files changed
Lines changed: 551 additions & 270 deletions
File tree
- orchagent
- tests/mock_tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
127 | 127 | | |
128 | 128 | | |
129 | 129 | | |
130 | | - | |
| 130 | + | |
| 131 | + | |
131 | 132 | | |
132 | 133 | | |
133 | 134 | | |
| |||
701 | 702 | | |
702 | 703 | | |
703 | 704 | | |
704 | | - | |
| 705 | + | |
| 706 | + | |
705 | 707 | | |
706 | 708 | | |
707 | 709 | | |
| |||
732 | 734 | | |
733 | 735 | | |
734 | 736 | | |
735 | | - | |
| 737 | + | |
| 738 | + | |
736 | 739 | | |
737 | 740 | | |
738 | 741 | | |
| |||
747 | 750 | | |
748 | 751 | | |
749 | 752 | | |
750 | | - | |
| 753 | + | |
| 754 | + | |
751 | 755 | | |
752 | 756 | | |
753 | 757 | | |
| |||
789 | 793 | | |
790 | 794 | | |
791 | 795 | | |
792 | | - | |
| 796 | + | |
| 797 | + | |
793 | 798 | | |
794 | 799 | | |
795 | 800 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
836 | 836 | | |
837 | 837 | | |
838 | 838 | | |
839 | | - | |
| 839 | + | |
840 | 840 | | |
841 | 841 | | |
842 | 842 | | |
| |||
0 commit comments