Skip to content

Error handling framework initial draft#391

Merged
xinliu-seattle merged 3 commits intosonic-net:masterfrom
sivamukka:error-handling
Dec 17, 2019
Merged

Error handling framework initial draft#391
xinliu-seattle merged 3 commits intosonic-net:masterfrom
sivamukka:error-handling

Conversation

@sivamukka
Copy link
Copy Markdown
Contributor

This document describes high level design details for Error Handling framework in SONiC.

Signed-off-by: Siva Mukka [email protected]

@msftclas
Copy link
Copy Markdown

msftclas commented May 23, 2019

CLA assistant check
All CLA requirements met.

The requirements for error handling framework are:

1.1.1 Provide registration/de-registration mechanism for applications to enable/disable error notifications on a specific table. More than one application can register for notifications on a given table.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can framework supports applications to register for notifications at attribute level?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notifications can only be enabled per-table. Failure status code is reported per object. For example, port table has multiple objects like MTU, admin state. Notifications can be enabled at Port table level, but not on MTU failures specifically.

- Extensible to all types of errors in the system, not restricted to APP_DB definitions.
- Efficient, as notifications are limited to failures in the DB.
- Notification for delete failures can be supported even when corresponding objects are deleted from APP_DB.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the new DB approach, however here is my thought, why can't we have a single DB (APP DB) and separated by namespaces? ex: configured vs applied/error in the same table so that it could be easy to maintain one table.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide more details here? Are you suggesting that ERROR tables can be stored in APP_DB, and there is no need to create ERROR_DB? We want to avoid modifying the existing ROUTE_TABLE schema in APP_DB - error handling can optionally be disabled and retain the current behavior.

- Translates it from SAI data types to ERROR_DB data types
- Adds an entry in to error database. If the entry already exists, the corresponding failure code is updated.
- Publishes the notifications to respective error listeners.
3. Error listener waits for the incoming notifications, filters them and invokes the application callback.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe how does Error listener filters the notifications ? what is the criteria supported? please add the use case?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we will add more details on this.

| Create failure | Delete failure | Remove the entry from database and notify the registered applications |
| Create failure | Update success | Remove the entry from the database and notify the registered applications |
| Create success | Delete failure | Add the entry to the database and notify the registered applications |
| Delete failure | Create success | Remove the entry from the database and notify the registered applications |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the applications get out of order notifications from feedback loop? How to handle in the case of it? Ex: User does create/delete/create and do you expect the error feedback come in order?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order of notifications will be preserved - because changes to APP_DB and ASIC_DB maintain the sequence. In case the same object fails multiple times, we need a unique transaction id to associate the operation and failure. To address this, we are looking at adding unique ID to each APP_DB operation and reporting the ID back as part of failure notification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants