Skip to content

fix(ci): Add retry logic for pulling registry:2 in prepare-docker-buildx#8236

Merged
yurishkuro merged 2 commits into
mainfrom
copilot/debug-spurious-failure
Mar 23, 2026
Merged

fix(ci): Add retry logic for pulling registry:2 in prepare-docker-buildx#8236
yurishkuro merged 2 commits into
mainfrom
copilot/debug-spurious-failure

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 23, 2026

The opensearch 3.x e2e CI job fails spuriously when docker run attempts to pull registry:2 from Docker Hub and the connection times out (Client.Timeout exceeded while awaiting headers), causing prepare-docker-buildx to exit with code 125.

Changes

  • scripts/makefiles/Docker.mkprepare-docker-buildx target:
    • Separate the pull from the run: attempt docker pull registry:2 up to 3 times with a 15s pause between failures (no unnecessary sleep on the final attempt)
    • After the loop, explicitly check with docker image inspect and abort with a clear error message if all pull attempts failed
    • Redirect stderr from docker inspect registry to suppress expected "No such object" noise
docker inspect registry > /dev/null 2>&1 || \
    { for i in 1 2 3; do \
        docker pull registry:2 && break; \
        echo "Attempt $$i/3 to pull registry:2 failed"; \
        [ "$$i" -lt 3 ] && sleep 15; \
      done; \
      docker image inspect registry:2 > /dev/null 2>&1 \
        || { echo "ERROR: Failed to pull registry:2 after 3 attempts"; exit 1; }; \
      docker run --rm -d -p 5000:5000 --name registry registry:2; }

💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.

The opensearch e2e CI job was failing spuriously because
`docker run --rm -d -p 5000:5000 --name registry registry:2`
timed out when pulling the registry:2 image from Docker Hub due
to intermittent network issues.

The fix adds retry logic (up to 3 attempts with 15s sleep between
each) for pulling registry:2 before starting the container. It also:
- Redirects stderr from `docker inspect registry` to suppress noise
- Skips the sleep on the last retry attempt
- Adds explicit failure detection if all pull attempts fail

Fixes: https://github.com/jaegertracing/jaeger/actions/runs/23439845443/job/68188774639

Co-authored-by: yurishkuro <[email protected]>
Agent-Logs-Url: https://github.com/jaegertracing/jaeger/sessions/b43c3538-1fea-453a-9768-60927028af7b
Copilot AI changed the title [WIP] Debug spurious failure in integration step fix(ci): add retry logic for pulling registry:2 in prepare-docker-buildx Mar 23, 2026
Copilot AI requested a review from yurishkuro March 23, 2026 14:35
@yurishkuro yurishkuro changed the title fix(ci): add retry logic for pulling registry:2 in prepare-docker-buildx fix(ci): Add retry logic for pulling registry:2 in prepare-docker-buildx Mar 23, 2026
@yurishkuro yurishkuro added the changelog:ci Change related to continuous integration / testing label Mar 23, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.63%. Comparing base (766c82d) to head (c156e51).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8236   +/-   ##
=======================================
  Coverage   95.63%   95.63%           
=======================================
  Files         319      319           
  Lines       16795    16795           
=======================================
  Hits        16062    16062           
  Misses        579      579           
  Partials      154      154           
Flag Coverage Δ
badger_direct 9.06% <ø> (ø)
badger_e2e 1.04% <ø> (ø)
cassandra-4.x-direct-manual 13.25% <ø> (ø)
cassandra-4.x-e2e-auto 1.03% <ø> (ø)
cassandra-4.x-e2e-manual 1.03% <ø> (ø)
cassandra-5.x-direct-manual 13.25% <ø> (ø)
cassandra-5.x-e2e-auto 1.03% <ø> (ø)
cassandra-5.x-e2e-manual 1.03% <ø> (ø)
clickhouse 1.16% <ø> (ø)
elasticsearch-6.x-direct 16.84% <ø> (ø)
elasticsearch-7.x-direct 16.87% <ø> (ø)
elasticsearch-8.x-direct 17.02% <ø> (ø)
elasticsearch-8.x-e2e 1.04% <ø> (ø)
elasticsearch-9.x-e2e 1.04% <ø> (ø)
grpc_direct 7.79% <ø> (ø)
grpc_e2e 1.04% <ø> (ø)
kafka-3.x-v2 1.04% <ø> (ø)
memory_v2 1.04% <ø> (ø)
opensearch-1.x-direct 16.91% <ø> (ø)
opensearch-2.x-direct 16.91% <ø> (ø)
opensearch-2.x-e2e 1.04% <ø> (ø)
opensearch-3.x-e2e 1.04% <ø> (ø)
query 1.04% <ø> (ø)
tailsampling-processor 0.52% <ø> (ø)
unittests 94.32% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yurishkuro yurishkuro marked this pull request as ready for review March 23, 2026 14:58
@yurishkuro yurishkuro requested a review from a team as a code owner March 23, 2026 14:58
Copilot AI review requested due to automatic review settings March 23, 2026 14:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves CI reliability for jobs that rely on the local Docker registry by adding retry logic when pulling registry:2 during prepare-docker-buildx, reducing spurious failures due to transient Docker Hub timeouts.

Changes:

  • Adds up-to-3-attempt retry loop for docker pull registry:2 with a 15s backoff between failed attempts.
  • Verifies the registry:2 image is present via docker image inspect and fails fast with a clearer error if not.
  • Suppresses expected docker inspect stderr noise when the registry container does not exist.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@echo "::group:: prepare-docker-buildx"
docker buildx inspect jaeger-build > /dev/null || docker buildx create --use --name=jaeger-build --buildkitd-flags="--allow-insecure-entitlement security.insecure --allow-insecure-entitlement network.host" --driver-opt="network=host"
docker inspect registry > /dev/null || docker run --rm -d -p 5000:5000 --name registry registry:2
docker inspect registry > /dev/null 2>&1 || \
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker inspect registry succeeds for both containers and images, and it also returns success for a stopped container. That can cause prepare-docker-buildx to skip starting the local registry even though no registry container is running. Consider switching to docker container inspect (or docker ps / docker inspect -f '{{.State.Running}}') and ensuring the container is running (start/recreate if stopped).

Copilot uses AI. Check for mistakes.
@yurishkuro yurishkuro merged commit 87abfdf into main Mar 23, 2026
73 of 74 checks passed
@yurishkuro yurishkuro deleted the copilot/debug-spurious-failure branch March 23, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog:ci Change related to continuous integration / testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants