Skip to content

rocm: reduce image size by using a multi-stage build#2246

Merged
olliewalsh merged 1 commit intomainfrom
rocm-multi-stage
Jan 12, 2026
Merged

rocm: reduce image size by using a multi-stage build#2246
olliewalsh merged 1 commit intomainfrom
rocm-multi-stage

Conversation

@mikebonnet
Copy link
Copy Markdown
Collaborator

@mikebonnet mikebonnet commented Dec 15, 2025

Only copy required binaries and libraries from the installation directory into the final image, and install only necessary runtime dependencies. The final image size is reduced by over 2Gb.

Summary by Sourcery

Use a multi-stage build for the ROCm container image and adjust build scripts to support installation into a temporary prefix for ROCm builds.

Enhancements:

  • Refactor the ROCm Containerfile to use a builder stage and produce a slimmer runtime image by copying only required binaries and shared libraries.
  • Update the build_llama_and_whisper.sh script to treat ROCm like other GPU-specific images for install prefix handling and to use a unified GPU targets flag instead of AMDGPU-specific targets.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Dec 15, 2025

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Converts the ROCm container image to a multi-stage build that installs llama/whisper binaries into a slim runtime image under /tmp/install, adjusts the build script to use a temporary install prefix for ROCm and to use GPU_TARGETS instead of AMDGPU_TARGETS, and removes some redundant install logic for whisper and llama binaries.

File-Level Changes

Change Details Files
Convert ROCm Containerfile to a multi-stage build and slim runtime image.
  • Change base stage to a named builder stage using Fedora 43 and run the existing build_llama_and_whisper.sh rocm script there.
  • Introduce a second Fedora 43 runtime stage that copies only selected llama and whisper binaries from /tmp/install/bin into /usr/bin.
  • Copy only shared libraries from /tmp/install/lib64 into /usr/lib64 in the final image.
  • Install only required ROCm runtime packages (hipblas, rocblas, rocm-hip, rocm-runtime, rocsolver) with weak dependencies disabled, then clean dnf caches.
container-images/rocm/Containerfile
Align build script install prefix and flags for ROCm multi-stage layout and simplify install steps.
  • Extend set_install_prefix to use /tmp/install for the rocm containerfile, matching other GPU-specific images.
  • Change ROCm CMake flag from AMDGPU_TARGETS to GPU_TARGETS, preserving the default list of architectures.
  • Remove explicit mkdir of the install_prefix/bin directory from the Whisper build function (now handled by cmake install or prior steps).
  • Remove redundant local install_prefix variable and its use in llama.cpp clone_and_build_llama_cpp, relying instead on cmake install behavior or centralized prefix logic.
container-images/scripts/build_llama_and_whisper.sh

Possibly linked issues

  • #rocm-ubi container build fails around dnf remove gcc-c++: PR’s multi-stage rocm build avoids the problematic dnf remove step that caused the original build failure.
  • #ramalama new rocm image is broken due to missing libhipblas.so.2: They both address the broken ROCm image; PR adds hipblas/ROCm runtimes via multi-stage build, fixing missing libhipblas.so.2.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @mikebonnet, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a multi-stage build strategy for the ROCm container image. The core purpose is to dramatically decrease the final image's footprint by isolating the build artifacts and only transferring the essential components to the production-ready image. This optimization ensures that the container remains lean while still providing full functionality for ROCm-accelerated applications.

Highlights

  • Multi-stage build implementation: The ROCm Containerfile has been refactored to utilize a multi-stage build process, separating the build environment from the final runtime image.
  • Significant image size reduction: By selectively copying only the necessary binaries and libraries from the build stage and installing minimal runtime dependencies, the final image size is reduced by over 2GB.
  • Optimized dependency management: The final image now installs only essential ROCm runtime dependencies using dnf with install_weak_deps=false.
  • Build script adjustments: The build_llama_and_whisper.sh script was updated to correctly set the installation prefix to /tmp/install for ROCm builds, align with the multi-stage approach, and include a variable name change from AMDGPU_TARGETS to GPU_TARGETS.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and found some issues that need to be addressed.

  • In clone_and_build_llama_cpp, install_prefix is no longer defined after removing the local assignment, but it’s still used in the final install command, which will cause the script to fail or use an unintended value—either reintroduce a local assignment from set_install_prefix or adjust the install path accordingly.
  • The HIP build flags changed from -DAMDGPU_TARGETS=... to -DGPU_TARGETS=...; please confirm this matches the current llama.cpp/whisper.cpp CMake option names, as this rename may be ignored or break target selection if the upstream variable is still AMDGPU_TARGETS.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `clone_and_build_llama_cpp`, `install_prefix` is no longer defined after removing the local assignment, but it’s still used in the final `install` command, which will cause the script to fail or use an unintended value—either reintroduce a local assignment from `set_install_prefix` or adjust the install path accordingly.
- The HIP build flags changed from `-DAMDGPU_TARGETS=...` to `-DGPU_TARGETS=...`; please confirm this matches the current llama.cpp/whisper.cpp CMake option names, as this rename may be ignored or break target selection if the upstream variable is still `AMDGPU_TARGETS`.

## Individual Comments

### Comment 1
<location> `container-images/scripts/build_llama_and_whisper.sh:246-245` </location>
<code_context>

 clone_and_build_llama_cpp() {
   local llama_cpp_commit="${LLAMA_CPP_PULL_REF:-$DEFAULT_LLAMA_CPP_COMMIT}"
-  local install_prefix
-  install_prefix=$(set_install_prefix)
   git_clone_specific_commit "${LLAMA_CPP_REPO:-https://github.com/ggml-org/llama.cpp}" "$llama_cpp_commit"
   cmake_steps "${common_flags[@]}"
   install -m 755 build/bin/rpc-server "$install_prefix"/bin/rpc-server
</code_context>

<issue_to_address>
**issue (bug_risk):** Removing the local install_prefix assignment risks installing rpc-server into an unintended path.

This function still calls `install ... "$install_prefix"/bin/rpc-server`, but `install_prefix` is no longer initialized here. Unless it’s guaranteed to be set in the environment, this will likely resolve to `/bin/rpc-server` or fail. Either restore `local install_prefix=$(set_install_prefix)` or remove this manual `install` in favor of `cmake --install` with the prefix configured in `cmake_steps`.
</issue_to_address>

### Comment 2
<location> `container-images/rocm/Containerfile:18` </location>
<code_context>
+  /tmp/install/bin/llama-quantize \
+  /tmp/install/bin/whisper-server \
+  /usr/bin/
+COPY --from=builder /tmp/install/lib64/*.so /usr/lib64/
+
+RUN dnf -y --setopt=install_weak_deps=false install hipblas rocblas rocm-hip rocm-runtime rocsolver && \
</code_context>

<issue_to_address>
**suggestion:** The COPY glob for shared libraries can fail if no .so files exist or miss versioned symlinks.

To avoid this fragility, either copy the entire directory (e.g., `COPY --from=builder /tmp/install/lib64/ /usr/lib64/`) or use a broader pattern such as `*.so*` so that all ROCm shared libraries and their symlinks are reliably included.

```suggestion
COPY --from=builder /tmp/install/lib64/ /usr/lib64/
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The code changes refactor the ROCm container build to a multi-stage Dockerfile, separating the llama.cpp and whisper.cpp compilation into a builder stage and then copying specific binaries and shared libraries, along with installing ROCm runtime dependencies, into the final image. The build script was updated to correctly configure the install prefix and GPU targets for ROCm. A review comment points out that the rpc-server binary is not being copied to the final image and suggests using a wildcard in the COPY command for binaries to ensure all executables are included robustly.

@mikebonnet mikebonnet force-pushed the rocm-multi-stage branch 2 times, most recently from 3fbfc50 to 532e5d0 Compare December 15, 2025 19:04
@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Dec 16, 2025

LGTM, assuming you tested this on an AMD system?

@mikebonnet
Copy link
Copy Markdown
Collaborator Author

Unfortunately I don't have a AMD system to test with. If someone could test ramalama run or ramalama serve with --image quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr-34482d11f78e7c93649c4cddb070c285e355ac39 that would be very helpful.

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Dec 16, 2025

Well i have a amd laptop, and it fails the same way with your image as the quay.io/ramalama/ramalama image, luckily for me, the laptop works with vulkan driver.

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Dec 16, 2025

.Memory critical error by agent node-0 (Agent handle: 0x3390320) on address 0x7f599f65e000. Reason: Memory in use.

@mikebonnet
Copy link
Copy Markdown
Collaborator Author

Does the quay.io/ramalama/rocm:0.15 image behave the same way?

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Dec 16, 2025

Yes

@mikebonnet
Copy link
Copy Markdown
Collaborator Author

There were reports that it was a firmware problem. Have you tried with different/older firmware?

@olliewalsh
Copy link
Copy Markdown
Collaborator

Unfortunately I don't have a AMD system to test with. If someone could test ramalama run or ramalama serve with --image quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr-34482d11f78e7c93649c4cddb070c285e355ac39 that would be very helpful.

I'm almost done setting up an AMD system so can test this soon, tomorrow hopefully

@mikebonnet mikebonnet force-pushed the rocm-multi-stage branch 2 times, most recently from 7677364 to 267815b Compare December 19, 2025 20:55
@bmahabirbu
Copy link
Copy Markdown
Collaborator

bmahabirbu commented Jan 6, 2026

I have an amd system but getting

brian@fedora:~/ramalama$ bin/ramalama run llama3.2 --image quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr
Trying to pull quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr...
Error: unable to copy from source docker://quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr: initializing source docker://quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr: reading manifest on-pr in quay.io/redhat-user-workloads/ramalama-tenant/rocm: manifest unknown
Error: Failed to serve model llama3.2, for ramalama run command
brian@fedora:~/ramalama$ 

is there a new manifest i can use? btw regular rocm on 16 works for me! with a 9070xt

@mikebonnet
Copy link
Copy Markdown
Collaborator Author

is there a new manifest i can use? btw regular rocm on 16 works for me! with a 9070xt

Let me rebase and rebuild, which will create a new manifest.

@mikebonnet
Copy link
Copy Markdown
Collaborator Author

@bmahabirbu You can test with --image quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr-2ae90c5206cd9a5d80e5ff1c062bfae7fe3b7360

@mikebonnet mikebonnet force-pushed the rocm-multi-stage branch 2 times, most recently from 3731a60 to 71efd86 Compare January 9, 2026 17:28
/tmp/install/bin/rpc-server \
/tmp/install/bin/whisper-server \
/usr/bin/
COPY --from=builder /tmp/install/lib64/*.so /usr/lib64/
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

symlinks could be an issue I think:

# ls -l /usr/lib64/libggml*
-rw-r--r--. 1 root root 992116 Jan  8 16:07 /usr/lib64/libggml-base.a
lrwxrwxrwx. 1 root root     17 Jan  8 16:11 /usr/lib64/libggml-base.so -> libggml-base.so.0
lrwxrwxrwx. 1 root root     21 Jan  8 16:11 /usr/lib64/libggml-base.so.0 -> libggml-base.so.0.9.5
-rwxr-xr-x. 1 root root 728752 Jan  8 16:08 /usr/lib64/libggml-base.so.0.9.5
-rw-r--r--. 1 root root 986336 Jan  8 16:07 /usr/lib64/libggml-cpu.a
-rw-r--r--. 1 root root  57602 Jan  8 16:08 /usr/lib64/libggml.a
lrwxrwxrwx. 1 root root     12 Jan  8 16:11 /usr/lib64/libggml.so -> libggml.so.0
lrwxrwxrwx. 1 root root     16 Jan  8 16:11 /usr/lib64/libggml.so.0 -> libggml.so.0.9.5
-rwxr-xr-x. 1 root root  47104 Jan  8 16:10 /usr/lib64/libggml.so.0.9.5

Would something like this work?
RUN --mount=type=bind,from=builder,source=/tmp/install,target=/tmp/install cp -a /tmp/install/lib64/* /usr/lib64/

or just keep it simple and cp -a /tmp/install/* /usr/

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of the space savings comes from not copying the unnecessary .a files. Are the symlinks required? In my testing it looked like like llama-server worked fine with just the .so files. If it's not working for you, I can figure out a different way to copy the symlinks over.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving the --mount idea a try.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RUN --mount worked like a charm:

$ podman run -it --rm quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr-bcf7372735e92e94ec51339bc1468e43c0a694f9 bash -c 'ls -la /usr/lib64/libggml* /usr/lib64/libllama*'
lrwxrwxrwx. 1 root root      17 Jan 10 08:21 /usr/lib64/libggml-base.so -> libggml-base.so.0
lrwxrwxrwx. 1 root root      21 Jan 10 08:21 /usr/lib64/libggml-base.so.0 -> libggml-base.so.0.9.5
-rwxr-xr-x. 1 root root  728752 Jan 10 08:04 /usr/lib64/libggml-base.so.0.9.5
lrwxrwxrwx. 1 root root      12 Jan 10 08:21 /usr/lib64/libggml.so -> libggml.so.0
lrwxrwxrwx. 1 root root      16 Jan 10 08:21 /usr/lib64/libggml.so.0 -> libggml.so.0.9.5
-rwxr-xr-x. 1 root root   47104 Jan 10 08:19 /usr/lib64/libggml.so.0.9.5
lrwxrwxrwx. 1 root root      13 Jan 10 08:21 /usr/lib64/libllama.so -> libllama.so.0
lrwxrwxrwx. 1 root root      17 Jan 10 08:21 /usr/lib64/libllama.so.0 -> libllama.so.0.0.1
-rwxr-xr-x. 1 root root 3010024 Jan 10 08:20 /usr/lib64/libllama.so.0.0.1

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can test the latest build with --image quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr-bcf7372735e92e94ec51339bc1468e43c0a694f9

Copy link
Copy Markdown
Collaborator Author

@mikebonnet mikebonnet Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New development, GGML_CPU_ALL_VARIANTS causes the .so files for the backends to be written to (and loaded from) bin/, so the Containerfile now copies those as well.

You can test the latest build with --image quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr-61f7101c2a24f5fb2e152d42f879c02447c32962

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

$ ramalama bench --image quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr-61f7101c2a24f5fb2e152d42f879c02447c32962 granite
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 9070 XT, gfx1201 (0x1201), VMM: no, Wave Size: 32
load_backend: loaded ROCm backend from /usr/bin/libggml-hip.so
load_backend: loaded RPC backend from /usr/bin/libggml-rpc.so
load_backend: loaded CPU backend from /usr/bin/libggml-cpu-haswell.so
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| granite 3B Q4_K - Medium       |   4.65 GiB |     8.17 B | ROCm       | 999 |           pp512 |     2636.43 ± 436.43 |
| granite 3B Q4_K - Medium       |   4.65 GiB |     8.17 B | ROCm       | 999 |           tg128 |         69.10 ± 0.23 |

build: 9689295 (1)

Prompt processing also seems to be much better (vs older llama.cpp on latest):

  Device 0: AMD Radeon RX 9070 XT, gfx1201 (0x1201), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| granite 3B Q4_K - Medium       |   4.65 GiB |     8.17 B | ROCm       | 999 |           pp512 |        595.84 ± 0.36 |
| granite 3B Q4_K - Medium       |   4.65 GiB |     8.17 B | ROCm       | 999 |           tg128 |         69.15 ± 0.22 |

build: b52edd2 (1)

Only copy required binaries and libraries from the installation directory into
the final image, and install only necessary runtime dependencies. The final image
size is reduced by over 2Gb.

Signed-off-by: Mike Bonnet <[email protected]>
Copy link
Copy Markdown
Collaborator

@olliewalsh olliewalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@olliewalsh olliewalsh merged commit 08f08c9 into main Jan 12, 2026
59 of 60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants