Skip to content

Update Auto_ts doc to include orchagent abort case#1128

Closed
vivekrnv wants to merge 12 commits intosonic-net:masterfrom
vivekrnv:auto_ts_orch_abort
Closed

Update Auto_ts doc to include orchagent abort case#1128
vivekrnv wants to merge 12 commits intosonic-net:masterfrom
vivekrnv:auto_ts_orch_abort

Conversation

@vivekrnv
Copy link
Contributor

@vivekrnv vivekrnv commented Nov 24, 2022

Update the Auto-TS HLD to include a special handling when orchagent aborts due to SAI programming failure

Repo PR title
sonic-buildimage Update syncd stop script to collect saisdkdump during orch abort
sonic-swss [orchagent] Set ABRT signal in STATE_DB during a SAI failure
sonic-utilities Updated TS and auto-TS to collect orch abrt saisdkdump files


A relevant message will be logged to syslog when the invocation fails because of LOCKFAIL exit code.

### 7.9 Orchagent abort consideration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vivekrnv it is ok to have the implementation for Nvidia/Mellanox syncd only, the question if the flow can be invoked on any ASIC vendor if they will add the support for that. if so, I think it should be considered as generic based on code availability yet as any other features in SAI. what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code availability will be present for all members. Every SAI vendor is expected to implement sai_dbg_generate_dump call which is used in saisdkdump. But it's not possible to determine if the dump is important for a particular vendor. As we already know only Nvidia is using saisdkdump according to techsupport.

So, i think we should keep it specific to Nvdia for now. if and when other vendors decide if it's important, they can add enable this for their platform.

1
```

During sai programming failure, orchagent will set the status to ORCH_ABRT_STATUS flag in STATE_DB. syncd.sh script checks if the ORCH_ABRT_STATUS flag is set in STATE_DB before stopping the syncd container and if yes proceeds with collecting saisdkdump to `/var/log/orch_abrt_saisdkdump/` on the host and also creates a file under /tmp named 'saidump_collection_notify_flag'. This is used to synchronize b/w auto-techsupport and syncd.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming it will be generic and some ASIC vendors will nor refer to the new state db adds, what will be the system behaviour?

Copy link
Contributor Author

@vivekrnv vivekrnv Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orchagent will write to STATE_DB irrespective of the vendor. syncd.sh script will look like this.

    if [[ x"$(${SONIC_DB_CLI} STATE_DB GET ORCH_ABRT_STATUS)" == x"1" ]]; then
        # Collecting saisdkdump before restarting syncd
        # Runs when orchagent is aborted because of SAI failure.
        # Only enabled for mellanox platform
        if [[ x$sonic_asic_platform == x"mellanox" ]]; then
            collect_saisdkdump
        fi
        # This is used to notify auto-techsupport process
        touch /tmp/saidump_collection_notify_flag
    fi

So, the auto-techsupport will function the same for all vendors. Only difference being the dump is not collected for other platforms.

@vivekrnv vivekrnv closed this Jan 5, 2023
@zhangyanzhao
Copy link
Collaborator

@liat-grozovik is this closed intentionally?

@dgsudharsan
Copy link
Contributor

@liat-grozovik is this closed intentionally?

@zhangyanzhao We have a new design to cover this #1212

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants