Skip to content

fix: handle UnpicklingError in cache read#17536

Merged
yejianquan merged 1 commit intosonic-net:masterfrom
cyw233:handle-unpickling-error-in-cache-read
Mar 18, 2025
Merged

fix: handle UnpicklingError in cache read#17536
yejianquan merged 1 commit intosonic-net:masterfrom
cyw233:handle-unpickling-error-in-cache-read

Conversation

@cyw233
Copy link
Contributor

@cyw233 cyw233 commented Mar 17, 2025

Description of PR

Handle UnpicklingError edge case in FactsCache::read() when parallel run is enabled.

Summary:
Fixes # (issue) Microsoft ADO 31839137

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405
  • 202411

Approach

What is the motivation for this PR?

When parallel run is enabled, multiple processes may try to read/write the same cache file, so there will be a chance that a file is being read by multiple processes at the same time, causing UnpicklingError in some of the processes. Therefore, we decided to retry to read the file after a short random sleep. If we still get the same error after retrying, we will return NOTEXIST to overwrite the file.

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@cyw233 cyw233 force-pushed the handle-unpickling-error-in-cache-read branch from dbc9468 to db1bcc3 Compare March 17, 2025 00:52
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@cyw233 cyw233 force-pushed the handle-unpickling-error-in-cache-read branch from db1bcc3 to 9115c7f Compare March 17, 2025 03:35
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@cyw233 cyw233 force-pushed the handle-unpickling-error-in-cache-read branch from 9115c7f to f3e836c Compare March 17, 2025 08:31
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@cyw233 cyw233 marked this pull request as ready for review March 18, 2025 05:20
Copy link
Collaborator

@yejianquan yejianquan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yejianquan yejianquan merged commit 0a6c881 into sonic-net:master Mar 18, 2025
18 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Mar 18, 2025
Description of PR
Handle UnpicklingError edge case in FactsCache::read() when parallel run is enabled.

Summary:
Fixes # (issue) Microsoft ADO 31839137

Approach
What is the motivation for this PR?
When parallel run is enabled, multiple processes may try to read/write the same cache file, so there will be a chance that a file is being read by multiple processes at the same time, causing UnpicklingError in some of the processes. Therefore, we decided to retry to read the file after a short random sleep. If we still get the same error after retrying, we will return NOTEXIST to overwrite the file.

co-authorized by: jianquanye@microsoft.com
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202411: #17584

@mssonicbld
Copy link
Collaborator

Cherry-pick PR to msft-202405: Azure/sonic-mgmt.msft#152

mssonicbld pushed a commit that referenced this pull request Mar 20, 2025
Description of PR
Handle UnpicklingError edge case in FactsCache::read() when parallel run is enabled.

Summary:
Fixes # (issue) Microsoft ADO 31839137

Approach
What is the motivation for this PR?
When parallel run is enabled, multiple processes may try to read/write the same cache file, so there will be a chance that a file is being read by multiple processes at the same time, causing UnpicklingError in some of the processes. Therefore, we decided to retry to read the file after a short random sleep. If we still get the same error after retrying, we will return NOTEXIST to overwrite the file.

co-authorized by: jianquanye@microsoft.com
amulyan7 pushed a commit to amulyan7/sonic-mgmt that referenced this pull request Mar 31, 2025
Description of PR
Handle UnpicklingError edge case in FactsCache::read() when parallel run is enabled.

Summary:
Fixes # (issue) Microsoft ADO 31839137

Approach
What is the motivation for this PR?
When parallel run is enabled, multiple processes may try to read/write the same cache file, so there will be a chance that a file is being read by multiple processes at the same time, causing UnpicklingError in some of the processes. Therefore, we decided to retry to read the file after a short random sleep. If we still get the same error after retrying, we will return NOTEXIST to overwrite the file.

co-authorized by: jianquanye@microsoft.com
OriTrabelsi pushed a commit to OriTrabelsi/sonic-mgmt that referenced this pull request Apr 1, 2025
Description of PR
Handle UnpicklingError edge case in FactsCache::read() when parallel run is enabled.

Summary:
Fixes # (issue) Microsoft ADO 31839137

Approach
What is the motivation for this PR?
When parallel run is enabled, multiple processes may try to read/write the same cache file, so there will be a chance that a file is being read by multiple processes at the same time, causing UnpicklingError in some of the processes. Therefore, we decided to retry to read the file after a short random sleep. If we still get the same error after retrying, we will return NOTEXIST to overwrite the file.

co-authorized by: jianquanye@microsoft.com
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 21, 2025
Description of PR
Handle UnpicklingError edge case in FactsCache::read() when parallel run is enabled.

Summary:
Fixes # (issue) Microsoft ADO 31839137

Approach
What is the motivation for this PR?
When parallel run is enabled, multiple processes may try to read/write the same cache file, so there will be a chance that a file is being read by multiple processes at the same time, causing UnpicklingError in some of the processes. Therefore, we decided to retry to read the file after a short random sleep. If we still get the same error after retrying, we will return NOTEXIST to overwrite the file.

co-authorized by: jianquanye@microsoft.com

Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Jan 26, 2026
Description of PR
Handle UnpicklingError edge case in FactsCache::read() when parallel run is enabled.

Summary:
Fixes # (issue) Microsoft ADO 31839137

Approach
What is the motivation for this PR?
When parallel run is enabled, multiple processes may try to read/write the same cache file, so there will be a chance that a file is being read by multiple processes at the same time, causing UnpicklingError in some of the processes. Therefore, we decided to retry to read the file after a short random sleep. If we still get the same error after retrying, we will return NOTEXIST to overwrite the file.

co-authorized by: jianquanye@microsoft.com

Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants