Skip to content

[action] [PR:17536] fix: handle UnpicklingError in cache read#152

Merged
cyw233 merged 1 commit intoAzure:202405from
mssonicbld:cherry/msft-202405/17536
Mar 18, 2025
Merged

[action] [PR:17536] fix: handle UnpicklingError in cache read#152
cyw233 merged 1 commit intoAzure:202405from
mssonicbld:cherry/msft-202405/17536

Conversation

@mssonicbld
Copy link
Collaborator

Description of PR

Handle UnpicklingError edge case in FactsCache::read() when parallel run is enabled.

Summary:
Fixes # (issue) Microsoft ADO 31839137

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405
  • 202411

Approach

What is the motivation for this PR?

When parallel run is enabled, multiple processes may try to read/write the same cache file, so there will be a chance that a file is being read by multiple processes at the same time, causing UnpicklingError in some of the processes. Therefore, we decided to retry to read the file after a short random sleep. If we still get the same error after retrying, we will return NOTEXIST to overwrite the file.

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should reviewer start? background context?
- List any dependencies that are required for this change.
-->
Handle `UnpicklingError` edge case in `FactsCache::read()` when parallel run is enabled.

Summary:
Fixes # (issue) Microsoft ADO 31839137

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [ ] Test case improvement

### Back port request
- [ ] 202012
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [x] 202411

### Approach
#### What is the motivation for this PR?
When parallel run is enabled, multiple processes may try to read/write the same cache file, so there will be a chance that a file is being read by multiple processes at the same time, causing UnpicklingError in some of the processes. Therefore, we decided to retry to read the file after a short random sleep. If we still get the same error after retrying, we will return NOTEXIST to overwrite the file.

#### How did you do it?

#### How did you verify/test it?

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->
@mssonicbld
Copy link
Collaborator Author

Original PR: sonic-net/sonic-mgmt#17536

@mssonicbld
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@cyw233 cyw233 merged commit 8cc541f into Azure:202405 Mar 18, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants