rocm: reduce image size by using a multi-stage build by mikebonnet · Pull Request #2246 · containers/ramalama

mikebonnet · 2025-12-15T18:24:54Z

Only copy required binaries and libraries from the installation directory into the final image, and install only necessary runtime dependencies. The final image size is reduced by over 2Gb.

Summary by Sourcery

Use a multi-stage build for the ROCm container image and adjust build scripts to support installation into a temporary prefix for ROCm builds.

Enhancements:

Refactor the ROCm Containerfile to use a builder stage and produce a slimmer runtime image by copying only required binaries and shared libraries.
Update the build_llama_and_whisper.sh script to treat ROCm like other GPU-specific images for install prefix handling and to use a unified GPU targets flag instead of AMDGPU-specific targets.

sourcery-ai · 2025-12-15T18:25:01Z

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Converts the ROCm container image to a multi-stage build that installs llama/whisper binaries into a slim runtime image under /tmp/install, adjusts the build script to use a temporary install prefix for ROCm and to use GPU_TARGETS instead of AMDGPU_TARGETS, and removes some redundant install logic for whisper and llama binaries.

File-Level Changes

Change	Details	Files
Convert ROCm Containerfile to a multi-stage build and slim runtime image.	Change base stage to a named builder stage using Fedora 43 and run the existing build_llama_and_whisper.sh rocm script there. Introduce a second Fedora 43 runtime stage that copies only selected llama and whisper binaries from /tmp/install/bin into /usr/bin. Copy only shared libraries from /tmp/install/lib64 into /usr/lib64 in the final image. Install only required ROCm runtime packages (hipblas, rocblas, rocm-hip, rocm-runtime, rocsolver) with weak dependencies disabled, then clean dnf caches.	`container-images/rocm/Containerfile`
Align build script install prefix and flags for ROCm multi-stage layout and simplify install steps.	Extend set_install_prefix to use /tmp/install for the rocm containerfile, matching other GPU-specific images. Change ROCm CMake flag from AMDGPU_TARGETS to GPU_TARGETS, preserving the default list of architectures. Remove explicit mkdir of the install_prefix/bin directory from the Whisper build function (now handled by cmake install or prior steps). Remove redundant local install_prefix variable and its use in llama.cpp clone_and_build_llama_cpp, relying instead on cmake install behavior or centralized prefix logic.	`container-images/scripts/build_llama_and_whisper.sh`

Possibly linked issues

#rocm-ubi container build fails around dnf remove gcc-c++: PR’s multi-stage rocm build avoids the problematic dnf remove step that caused the original build failure.
#ramalama new rocm image is broken due to missing libhipblas.so.2: They both address the broken ROCm image; PR adds hipblas/ROCm runtimes via multi-stage build, fixing missing libhipblas.so.2.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

gemini-code-assist · 2025-12-15T18:25:15Z

Summary of Changes

Hello @mikebonnet, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a multi-stage build strategy for the ROCm container image. The core purpose is to dramatically decrease the final image's footprint by isolating the build artifacts and only transferring the essential components to the production-ready image. This optimization ensures that the container remains lean while still providing full functionality for ROCm-accelerated applications.

Highlights

Multi-stage build implementation: The ROCm Containerfile has been refactored to utilize a multi-stage build process, separating the build environment from the final runtime image.
Significant image size reduction: By selectively copying only the necessary binaries and libraries from the build stage and installing minimal runtime dependencies, the final image size is reduced by over 2GB.
Optimized dependency management: The final image now installs only essential ROCm runtime dependencies using dnf with install_weak_deps=false.
Build script adjustments: The build_llama_and_whisper.sh script was updated to correctly set the installation prefix to /tmp/install for ROCm builds, align with the multi-stage approach, and include a variable name change from AMDGPU_TARGETS to GPU_TARGETS.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

sourcery-ai

Hey there - I've reviewed your changes and found some issues that need to be addressed.

In clone_and_build_llama_cpp, install_prefix is no longer defined after removing the local assignment, but it’s still used in the final install command, which will cause the script to fail or use an unintended value—either reintroduce a local assignment from set_install_prefix or adjust the install path accordingly.
The HIP build flags changed from -DAMDGPU_TARGETS=... to -DGPU_TARGETS=...; please confirm this matches the current llama.cpp/whisper.cpp CMake option names, as this rename may be ignored or break target selection if the upstream variable is still AMDGPU_TARGETS.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `clone_and_build_llama_cpp`, `install_prefix` is no longer defined after removing the local assignment, but it’s still used in the final `install` command, which will cause the script to fail or use an unintended value—either reintroduce a local assignment from `set_install_prefix` or adjust the install path accordingly.
- The HIP build flags changed from `-DAMDGPU_TARGETS=...` to `-DGPU_TARGETS=...`; please confirm this matches the current llama.cpp/whisper.cpp CMake option names, as this rename may be ignored or break target selection if the upstream variable is still `AMDGPU_TARGETS`.

## Individual Comments

### Comment 1
<location> `container-images/scripts/build_llama_and_whisper.sh:246-245` </location>
<code_context>

 clone_and_build_llama_cpp() {
   local llama_cpp_commit="${LLAMA_CPP_PULL_REF:-$DEFAULT_LLAMA_CPP_COMMIT}"
-  local install_prefix
-  install_prefix=$(set_install_prefix)
   git_clone_specific_commit "${LLAMA_CPP_REPO:-https://github.com/ggml-org/llama.cpp}" "$llama_cpp_commit"
   cmake_steps "${common_flags[@]}"
   install -m 755 build/bin/rpc-server "$install_prefix"/bin/rpc-server
</code_context>

<issue_to_address>
**issue (bug_risk):** Removing the local install_prefix assignment risks installing rpc-server into an unintended path.

This function still calls `install ... "$install_prefix"/bin/rpc-server`, but `install_prefix` is no longer initialized here. Unless it’s guaranteed to be set in the environment, this will likely resolve to `/bin/rpc-server` or fail. Either restore `local install_prefix=$(set_install_prefix)` or remove this manual `install` in favor of `cmake --install` with the prefix configured in `cmake_steps`.
</issue_to_address>

### Comment 2
<location> `container-images/rocm/Containerfile:18` </location>
<code_context>
+  /tmp/install/bin/llama-quantize \
+  /tmp/install/bin/whisper-server \
+  /usr/bin/
+COPY --from=builder /tmp/install/lib64/*.so /usr/lib64/
+
+RUN dnf -y --setopt=install_weak_deps=false install hipblas rocblas rocm-hip rocm-runtime rocsolver && \
</code_context>

<issue_to_address>
**suggestion:** The COPY glob for shared libraries can fail if no .so files exist or miss versioned symlinks.

To avoid this fragility, either copy the entire directory (e.g., `COPY --from=builder /tmp/install/lib64/ /usr/lib64/`) or use a broader pattern such as `*.so*` so that all ROCm shared libraries and their symlinks are reliably included.

```suggestion
COPY --from=builder /tmp/install/lib64/ /usr/lib64/
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

container-images/scripts/build_llama_and_whisper.sh

container-images/rocm/Containerfile

gemini-code-assist

Code Review

The code changes refactor the ROCm container build to a multi-stage Dockerfile, separating the llama.cpp and whisper.cpp compilation into a builder stage and then copying specific binaries and shared libraries, along with installing ROCm runtime dependencies, into the final image. The build script was updated to correctly configure the install prefix and GPU targets for ROCm. A review comment points out that the rpc-server binary is not being copied to the final image and suggests using a wildcard in the COPY command for binaries to ensure all executables are included robustly.

container-images/rocm/Containerfile

rhatdan · 2025-12-16T12:44:42Z

LGTM, assuming you tested this on an AMD system?

mikebonnet · 2025-12-16T16:06:31Z

Unfortunately I don't have a AMD system to test with. If someone could test ramalama run or ramalama serve with --image quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr-34482d11f78e7c93649c4cddb070c285e355ac39 that would be very helpful.

rhatdan · 2025-12-16T16:14:52Z

Well i have a amd laptop, and it fails the same way with your image as the quay.io/ramalama/ramalama image, luckily for me, the laptop works with vulkan driver.

rhatdan · 2025-12-16T16:15:06Z

.Memory critical error by agent node-0 (Agent handle: 0x3390320) on address 0x7f599f65e000. Reason: Memory in use.

mikebonnet · 2025-12-16T16:16:13Z

Does the quay.io/ramalama/rocm:0.15 image behave the same way?

rhatdan · 2025-12-16T16:18:29Z

Yes

mikebonnet · 2025-12-16T16:49:51Z

There were reports that it was a firmware problem. Have you tried with different/older firmware?

olliewalsh · 2025-12-16T23:13:53Z

Unfortunately I don't have a AMD system to test with. If someone could test ramalama run or ramalama serve with --image quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr-34482d11f78e7c93649c4cddb070c285e355ac39 that would be very helpful.

I'm almost done setting up an AMD system so can test this soon, tomorrow hopefully

bmahabirbu · 2026-01-06T04:23:57Z

I have an amd system but getting

brian@fedora:~/ramalama$ bin/ramalama run llama3.2 --image quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr
Trying to pull quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr...
Error: unable to copy from source docker://quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr: initializing source docker://quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr: reading manifest on-pr in quay.io/redhat-user-workloads/ramalama-tenant/rocm: manifest unknown
Error: Failed to serve model llama3.2, for ramalama run command
brian@fedora:~/ramalama$

is there a new manifest i can use? btw regular rocm on 16 works for me! with a 9070xt

mikebonnet · 2026-01-06T15:31:34Z

is there a new manifest i can use? btw regular rocm on 16 works for me! with a 9070xt

Let me rebase and rebuild, which will create a new manifest.

mikebonnet · 2026-01-06T16:49:13Z

@bmahabirbu You can test with --image quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr-2ae90c5206cd9a5d80e5ff1c062bfae7fe3b7360

container-images/scripts/build_llama_and_whisper.sh

olliewalsh · 2026-01-09T21:31:57Z

container-images/rocm/Containerfile

+  /tmp/install/bin/rpc-server \
+  /tmp/install/bin/whisper-server \
+  /usr/bin/
+COPY --from=builder /tmp/install/lib64/*.so /usr/lib64/


symlinks could be an issue I think:

# ls -l /usr/lib64/libggml* -rw-r--r--. 1 root root 992116 Jan 8 16:07 /usr/lib64/libggml-base.a lrwxrwxrwx. 1 root root 17 Jan 8 16:11 /usr/lib64/libggml-base.so -> libggml-base.so.0 lrwxrwxrwx. 1 root root 21 Jan 8 16:11 /usr/lib64/libggml-base.so.0 -> libggml-base.so.0.9.5 -rwxr-xr-x. 1 root root 728752 Jan 8 16:08 /usr/lib64/libggml-base.so.0.9.5 -rw-r--r--. 1 root root 986336 Jan 8 16:07 /usr/lib64/libggml-cpu.a -rw-r--r--. 1 root root 57602 Jan 8 16:08 /usr/lib64/libggml.a lrwxrwxrwx. 1 root root 12 Jan 8 16:11 /usr/lib64/libggml.so -> libggml.so.0 lrwxrwxrwx. 1 root root 16 Jan 8 16:11 /usr/lib64/libggml.so.0 -> libggml.so.0.9.5 -rwxr-xr-x. 1 root root 47104 Jan 8 16:10 /usr/lib64/libggml.so.0.9.5

Would something like this work?
RUN --mount=type=bind,from=builder,source=/tmp/install,target=/tmp/install cp -a /tmp/install/lib64/* /usr/lib64/

or just keep it simple and cp -a /tmp/install/* /usr/

A lot of the space savings comes from not copying the unnecessary .a files. Are the symlinks required? In my testing it looked like like llama-server worked fine with just the .so files. If it's not working for you, I can figure out a different way to copy the symlinks over.

Giving the --mount idea a try.

RUN --mount worked like a charm:

$ podman run -it --rm quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr-bcf7372735e92e94ec51339bc1468e43c0a694f9 bash -c 'ls -la /usr/lib64/libggml* /usr/lib64/libllama*' lrwxrwxrwx. 1 root root 17 Jan 10 08:21 /usr/lib64/libggml-base.so -> libggml-base.so.0 lrwxrwxrwx. 1 root root 21 Jan 10 08:21 /usr/lib64/libggml-base.so.0 -> libggml-base.so.0.9.5 -rwxr-xr-x. 1 root root 728752 Jan 10 08:04 /usr/lib64/libggml-base.so.0.9.5 lrwxrwxrwx. 1 root root 12 Jan 10 08:21 /usr/lib64/libggml.so -> libggml.so.0 lrwxrwxrwx. 1 root root 16 Jan 10 08:21 /usr/lib64/libggml.so.0 -> libggml.so.0.9.5 -rwxr-xr-x. 1 root root 47104 Jan 10 08:19 /usr/lib64/libggml.so.0.9.5 lrwxrwxrwx. 1 root root 13 Jan 10 08:21 /usr/lib64/libllama.so -> libllama.so.0 lrwxrwxrwx. 1 root root 17 Jan 10 08:21 /usr/lib64/libllama.so.0 -> libllama.so.0.0.1 -rwxr-xr-x. 1 root root 3010024 Jan 10 08:20 /usr/lib64/libllama.so.0.0.1

You can test the latest build with --image quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr-bcf7372735e92e94ec51339bc1468e43c0a694f9

New development, GGML_CPU_ALL_VARIANTS causes the .so files for the backends to be written to (and loaded from) bin/, so the Containerfile now copies those as well.

You can test the latest build with --image quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr-61f7101c2a24f5fb2e152d42f879c02447c32962

LGTM

$ ramalama bench --image quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr-61f7101c2a24f5fb2e152d42f879c02447c32962 granite ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 9070 XT, gfx1201 (0x1201), VMM: no, Wave Size: 32 load_backend: loaded ROCm backend from /usr/bin/libggml-hip.so load_backend: loaded RPC backend from /usr/bin/libggml-rpc.so load_backend: loaded CPU backend from /usr/bin/libggml-cpu-haswell.so | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | granite 3B Q4_K - Medium | 4.65 GiB | 8.17 B | ROCm | 999 | pp512 | 2636.43 ± 436.43 | | granite 3B Q4_K - Medium | 4.65 GiB | 8.17 B | ROCm | 999 | tg128 | 69.10 ± 0.23 | build: 9689295 (1)

Prompt processing also seems to be much better (vs older llama.cpp on latest):

Device 0: AMD Radeon RX 9070 XT, gfx1201 (0x1201), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | granite 3B Q4_K - Medium | 4.65 GiB | 8.17 B | ROCm | 999 | pp512 | 595.84 ± 0.36 | | granite 3B Q4_K - Medium | 4.65 GiB | 8.17 B | ROCm | 999 | tg128 | 69.15 ± 0.22 | build: b52edd2 (1)

Only copy required binaries and libraries from the installation directory into the final image, and install only necessary runtime dependencies. The final image size is reduced by over 2Gb. Signed-off-by: Mike Bonnet <[email protected]>

olliewalsh

LGTM

mikebonnet requested review from bmahabirbu, cgruver, engelmi, jhjaggars, maxamillion, rhatdan and swarajpande5 as code owners December 15, 2025 18:24

sourcery-ai bot reviewed Dec 15, 2025

View reviewed changes

container-images/scripts/build_llama_and_whisper.sh Show resolved Hide resolved

container-images/rocm/Containerfile Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Dec 15, 2025

View reviewed changes

container-images/rocm/Containerfile Outdated Show resolved Hide resolved

mikebonnet force-pushed the rocm-multi-stage branch 2 times, most recently from 3fbfc50 to 532e5d0 Compare December 15, 2025 19:04

rhatdan approved these changes Dec 16, 2025

View reviewed changes

mikebonnet force-pushed the rocm-multi-stage branch 2 times, most recently from 7677364 to 267815b Compare December 19, 2025 20:55

mikebonnet force-pushed the rocm-multi-stage branch from 267815b to 2ae90c5 Compare January 6, 2026 15:45

mikebonnet force-pushed the rocm-multi-stage branch 2 times, most recently from 3731a60 to 71efd86 Compare January 9, 2026 17:28

olliewalsh reviewed Jan 9, 2026

View reviewed changes

mikebonnet force-pushed the rocm-multi-stage branch from 71efd86 to bcf7372 Compare January 10, 2026 07:43

mikebonnet force-pushed the rocm-multi-stage branch from bcf7372 to 61f7101 Compare January 10, 2026 18:46

olliewalsh approved these changes Jan 12, 2026

View reviewed changes

olliewalsh merged commit 08f08c9 into main Jan 12, 2026
59 of 60 checks passed

Conversation

mikebonnet commented Dec 15, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

gemini-code-assist bot commented Dec 15, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

rhatdan commented Dec 16, 2025

Uh oh!

mikebonnet commented Dec 16, 2025

Uh oh!

rhatdan commented Dec 16, 2025

Uh oh!

rhatdan commented Dec 16, 2025

Uh oh!

mikebonnet commented Dec 16, 2025

Uh oh!

rhatdan commented Dec 16, 2025

Uh oh!

mikebonnet commented Dec 16, 2025

Uh oh!

olliewalsh commented Dec 16, 2025

Uh oh!

bmahabirbu commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikebonnet commented Jan 6, 2026

Uh oh!

mikebonnet commented Jan 6, 2026

Uh oh!

Uh oh!

olliewalsh Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

mikebonnet Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

mikebonnet Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

mikebonnet Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

mikebonnet Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

mikebonnet Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

olliewalsh Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

olliewalsh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

mikebonnet commented Dec 15, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Dec 15, 2025 •

edited

Loading

bmahabirbu commented Jan 6, 2026 •

edited

Loading

mikebonnet Jan 10, 2026 •

edited

Loading