Skip to content

Corefile uploader service (#3887)#3914

Closed
renukamanavalan wants to merge 1 commit intosonic-net:201811from
renukamanavalan:core_uploader_201811
Closed

Corefile uploader service (#3887)#3914
renukamanavalan wants to merge 1 commit intosonic-net:201811from
renukamanavalan:core_uploader_201811

Conversation

@renukamanavalan
Copy link
Contributor

  • Corefile uploader service
  1. A service is added to watch /var/core and upload to Azure storage
  2. The service is disabled on boot. One may enable explicitly.
  3. The .rc file to be updated with acct credentials and http proxy to use.
  4. If service is enabled with no credentials, it would sleep, with periodic log messages
  5. For any update in .rc, the service has to be restarted to take effect.
  • Remove rw permission for .rc file for group & others.

  • Changes per review comments.
    Re-ordered .rc file per JSON.dump order.
    Added a script to enable partial update of .rc, which HWProxy would use to add acct key.

  • Azure storage upload requires python module futures, hence added it to install list.

  • Removed trailing spaces.

  • A mistake in name corrected.
    Copy the .rc updater script to /usr/bin.

- What I did

- How I did it

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

* Corefile uploader service

1) A service is added to watch /var/core and upload to Azure storage
2) The service is disabled on boot. One may enable explicitly.
3) The .rc file to be updated with acct credentials and http proxy to use.
4) If service is enabled with no credentials, it would sleep, with periodic log messages
5) For any update in .rc, the service has to be restarted to take effect.

* Remove rw permission for .rc file for group & others.

* Changes per review comments.
Re-ordered .rc file per JSON.dump order.
Added a script to enable partial update of .rc, which HWProxy would use to add acct key.

* Azure storage upload requires python module futures, hence added it to install list.

* Removed trailing spaces.

* A mistake in name corrected.
Copy the .rc updater script to /usr/bin.
"core_upload": "/tmp/core_upload/"
},
"azure_sonic_core_storage": {
"account_name": "corefilecollection",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use template for both name and share_name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use /usr/bin/update_json.py to update any of the entries.

sudo cp $IMAGE_CONFIGS/hostcfgd/*.j2 $FILESYSTEM_ROOT_USR_SHARE_SONIC_TEMPLATES/

# copy core file uploader files
sudo cp $IMAGE_CONFIGS/corefile_uploader/core_uploader.service $FILESYSTEM_ROOT/etc/systemd/system/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to put all these into sonic-utilities. In the build image repo, we just need to install the core_uploader service.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave core_uploader.py & its rc file here, as they are part of .service.
Moved update_json.py to sonic-utilities.



@staticmethod
def wait_for_file_write_complete(path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a better way is to check the update time of the file and compare with current time, if it is more than 10 minutes, then it likely means the file is complete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctime is not create but modified time.
So I just changed it to be more safe.
Theoretically, there can't be one robust safe way, as it all depends on how the write happens.

tar.close()
log_debug("Tar file for upload created: " + tarf_name)

Handler.upload_file(tarf_name, tarf_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the core file removed after upload?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.
Reasons:

  1. I am not attaching core file to ICM now, so as not to increase ICM DB size,. Hence we need it here.

  2. With core files remaining, upon service restart, it will re upload,
    a) Re-upload is not a problem, as the analyzer would understand that it is duplicate and will not file any ticket
    b) If the file is still in azure storage, it would only get re-updated and no duplicates.
    In a normal system, the service restart should be pretty seldom (only upon reboot)

  3. When s process dumps cores repeatedly, only one ticket is created. Even if we attach core file., we would attach only one. But often, we might like to see multiple dumps during core file analysis to help see a pattern.

On a side note: FYI:
5) We have an upcoming core-mgmt service from Broadcom that would limit the count of core files for a process

log_debug("Remote file created: name{} path{}".format(fname, fpath))
break

except Exception as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of failures do we see, if it is access failure due to credential, then retry would not help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being a http based service, there can be many different failures. So we retry 3 times with a pause in between. If we still could not, get back to main loop and waiting for next core.

Copy link
Collaborator

@lguohan lguohan Dec 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try invalid user name and password, and see what error message we will have. at least if it is account issue, we should print out some message for alert. you want differentiate the network temp error v.s. systematic error, for example no storage space, account error, so that system managers can treat those message differently.

@renukamanavalan renukamanavalan deleted the core_uploader_201811 branch April 17, 2022 22:08
mssonicbld added a commit that referenced this pull request Jun 15, 2025
…atically (#22925)

#### Why I did it
src/sonic-utilities
```
* 6bd88046 - (HEAD -> 202505, origin/202505) Support reboot cause: Kernel Panic - Out of memory (#3921) (2 days ago) [mssonicbld]
* 28ae9cef - [db_migrator] Add version_202411_02 (#3912) (5 days ago) [mssonicbld]
* 4586e973 - [intfstat] Align output format between cached/non-cached scenarios (#3913) (5 days ago) [mssonicbld]
* 11b6fc0c - [Mellanox] Collect sai.xml to sysdump (#3914) (5 days ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants