[resolv-config] Improve container resolv.conf update mechanism#22439
[resolv-config] Improve container resolv.conf update mechanism#22439qiluo-msft merged 1 commit intosonic-net:masterfrom
Conversation
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
ac4f210 to
d4fd379
Compare
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
How does "waits up to 5 seconds for containers to start" help solving a possible race condition? Is it just reduce the possibility but not completely resolve it? Can we add a testcase to prevent future regression? |
|
Is it possible to add a hook to run the update_container script after Docker starts? |
There is a small time window when the race condition can occur. It always happens only during the config reload when the DNS configuration changes.
The time window for this scenario is very small, which is why the issue has such a low reproducibility rate. In this scenario, waiting 5 seconds ensures that the Docker container has enough time to move into the running state. |
- Add support for single container updates with container name argument - Implement parallel updates for bulk operations - Add comprehensive logging with syslog integration - Add container state handling (start/wait for stopped containers) - Add proper error handling and status reporting - Remove temporary file usage during resolv.conf updates - Add networking service check for bulk updates only Integration changes: - Add automatic DNS update call in docker_image_ctl post-start action
d4fd379 to
82aad6d
Compare
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@qiluo-msft, @ganglyu I updated the implementation to remove 5seconds wait period. Please review |
|
Did you ever try this easier solution https://unix.stackexchange.com/a/348406/161486? |
| exit $? | ||
| fi | ||
|
|
||
| # Check if networking service is active (only for bulk updates) |
There was a problem hiding this comment.
I'm not sure I fully understand this comment
|
The solution described in https://unix.stackexchange.com/a/348406/161486 is redundant. This is a default behavior Docker has. When the container starts dockerd copies "/etc/resolv.conf" file from the host OS. The issue with this behavior is described in my previous comment: |
|
Cherry-pick PR to 202411: #22462 |
Why I did it
Fix a possible race condition during the config reload caused by the concurrent restart of the Docker containers and the resolv-config service. The update of the DNS configuration inside the container is triggered by the container's post-start action.
Work item tracking
How I did it
Integration changes:
How to verify it
Run sonic-mgmt DNS tests
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)