Skip to content

[feat] push in waves for cross-repository layer dedup#521

Open
peakschris wants to merge 2 commits into
bazel-contrib:mainfrom
peakschris:cb_waves
Open

[feat] push in waves for cross-repository layer dedup#521
peakschris wants to merge 2 commits into
bazel-contrib:mainfrom
peakschris:cb_waves

Conversation

@peakschris
Copy link
Copy Markdown

@peakschris peakschris commented May 16, 2026

multi_deploy: deduplicate shared layers via wave-based push scheduling

When multiple push targets in a multi_deploy share layers — for example, when several service images are built from the same intermediate layer — the deploy tool currently re-uploads those layers to every target repository independently.

Problem

The OCI cross-mount mechanism (POST /v2/{repo}/blobs/uploads/?mount={digest}&from={source}) lets a registry copy a blob server-side from another repository on the same registry, transferring zero bytes from the client. go-containerregistry supports this via remote.MountableLayer, and the VFS already wraps base-image layers with mount hints derived from the pull source.

However, cross-mounting only works once the source blob is fully present in the registry. remote.MultiWrite processes push targets concurrently: when pushing to reg/svc-a and reg/svc-b at the same time, both issue HEAD /v2/{repo}/blobs/{digest} for the shared layer, both receive 404, and both fall back to a full upload before either has finished. The API has no way to express "upload this layer once and cross-mount it everywhere else" within a single call because the mount hint is advisory — if the source is not yet present the registry rejects it and the client must upload anyway.

Solution

Introduce a wave-based push scheduler (img_tool/cmd/deploy/waves.go) that enforces the required ordering outside the library:

  1. Partition push operations into sequential waves using a greedy single-pass algorithm. An operation is assigned to wave W+1 if any same-registry operation already in wave ≤W has claimed one of its shared layers.
  2. Wave 1 uploads each shared layer exactly once (the first claimant in manifest order). remote.MultiWrite still runs concurrently within the wave.
  3. Wave 2+ operations start only after wave 1 completes. By then the shared layers are present in the registry, so the MountableLayer hints succeed and the client transfers zero bytes.

Same-registry, same-repository operations need no wave dependency (the registry deduplicates within a repository via HEAD checks). Operations on different registries form no dependency either, since cross-registry mounting is not required by the OCI Distribution Specification.

Changes

  • img_tool/cmd/deploy/waves.gowaveGroup type and planPushWaves function
  • img_tool/cmd/deploy/deploy.go — wave-based push loop; planning runs before vfsBuilder.Build() so hints are passed in via the builder rather than as a post-build mutation
  • img_tool/pkg/deployvfs/deployvfs.govfsBuilder.WithCrossMountHints() method; hints are merged in Build() with base-image hints retaining priority
  • img_tool/cmd/deploy/waves_test.go — 13 cases covering the wave assignment algorithm
  • img_tool/pkg/deployvfs/deployvfs_test.go — 3 cases for WithCrossMountHints exercised through a real Build() call

No Bazel rule changes required.

@malt3
Copy link
Copy Markdown
Collaborator

malt3 commented May 16, 2026

Thanks for this optimization! I feel like it would be even better if we could improve the upstream go-containerregistry instead, but if this meaningfully improves performance then let's do it.
I'll need a bit of time to think about the changes.

@malt3 malt3 added this pull request to the merge queue May 16, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 16, 2026
Copy link
Copy Markdown
Collaborator

@malt3 malt3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an issue here:

nogo: nogo: error running analyzers: 32 analyzers skipped due to type-checking error: external/rules_img_tool+/cmd/deploy/deploy.go:165:9: undefined: waveGroup

@peakschris
Copy link
Copy Markdown
Author

peakschris commented May 16, 2026

Thanks for this optimization! I feel like it would be even better if we could improve the upstream go-containerregistry instead, but if this meaningfully improves performance then let's do it.

Thanks!

There's an issue here:

Fixed

@peakschris peakschris requested a review from malt3 May 16, 2026 12:08
@malt3 malt3 enabled auto-merge May 16, 2026 12:19
@malt3 malt3 added this pull request to the merge queue May 16, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants