Skip to content

Latest commit

 

History

History
214 lines (164 loc) · 11.8 KB

File metadata and controls

214 lines (164 loc) · 11.8 KB

Dockerfiles for TheRock

This directory is home to the Dockerfiles we use as part of ROCm development, as well as supporting scripts.

Available Dockerfiles

build_manylinux_*.Dockerfile

Source .Dockerfile Published package
build_manylinux_x86_x64.Dockerfile https://github.com/ROCm/TheRock/pkgs/container/therock_build_manylinux_x86_64
build_manylinux_rccl_x86_64.Dockerfile https://github.com/ROCm/TheRock/pkgs/container/therock_build_manylinux_rccl_x86_64

These Dockerfiles are used to build ROCm, PyTorch, and other packages for release across a wide variety of Linux distributions. They are derived from manylinux Docker images which are based on older RHEL (Red Hat Enterprise Linux) derivatives, provide a minimal set of development libraries, and support building binaries compatible with most Linux distributions that use glibc 2.39 or greater.

Requirements for these files:

  • Tools like gcc, CMake, ninja, etc. should generally be set to the minimum versions of those tools that we support.
  • These Dockerfiles deliberately exclude development packages for most libraries because their presence would create hidden dependencies on specific library versions that may not be binary-compatible across distributions. Instead, TheRock vendors all necessary dependencies and builds them from source with specific modifications to ensure they cannot conflict with system libraries.

no_rocm_image_*.Dockerfile

Source .Dockerfile Published package
no_rocm_image_ubuntu24_04.Dockerfile https://github.com/ROCm/TheRock/pkgs/container/no_rocm_image_ubuntu24_04

These Dockerfiles are used to test ROCm, PyTorch, and other packages. They are derived from standard OS images (e.g. FROM ubuntu:24.04) and contain a minimal extra set of development tools and environment settings necessary to run project tests on GPU test machines. These images do NOT include a ROCm install, as they are used from test workflows that begin by installing ROCm (or other) packages produced by CI/CD workflows.

Requirements for these files:

  • No system install of ROCm - workflows using these images are expected to install ROCm after starting a container.
  • Minimal other system dependencies, especially development packages, for many of the same reasons as for the "build" Dockerfiles above. Test environments should be similar to user/production environments, so any extra dependencies that are added to these test Dockerfiles may hide real issues.
    • Some extra tools like lit for LLVM testing are acceptable and should be evaluated on a case-by-case basis.

This test dockerfile is used by various CI/CD workflows, such as:

To run this test dockerfile yourself on a machine with a compatible GPU:

sudo docker run -it \
  --device=/dev/kfd --device=/dev/dri \
  --ipc=host --group-add=video --group-add=render --group-add=110 \
  ghcr.io/rocm/no_rocm_image_ubuntu24_04@latest

Tip

We recommend creating a Python venv to install Python packages while using this image:

sudo apt update && sudo apt install python3-venv -y
python3 -m venv .venv && source .venv/bin/activate

# Then install packages. For example, with nightly rocm packages for gfx94X:
pip install --index-url=https://rocm.nightlies.amd.com/v2/gfx94X-dcgpu 'rocm[libraries,devel]'
Source .Dockerfile Published package
no_rocm_image_ubuntu24_04_rocgdb.Dockerfile https://github.com/ROCm/TheRock/pkgs/container/no_rocm_image_ubuntu24_04_rocgdb

Extended version of no_rocm_image_ubuntu24_04.Dockerfile, containing additional packages and tools required for validation of rocgdb. This includes dejagnu, make, gcc, g++ and gfortran.

Source .Dockerfile Published package
no_rocm_image_ubuntu24_04_ocl_rt.Dockerfile https://github.com/ROCm/TheRock/pkgs/container/no_rocm_image_ubuntu24_04_ocl_rt

Used in ocltst execution. It installs OCL ICD package required by ocltst

rocm_runtime.Dockerfile

Source .Dockerfile Published package
rocm_runtime.Dockerfile N/A

This Dockerfile builds ROCm runtime images from TheRock prebuilt tarballs for multiple Linux distributions using a single Dockerfile via the BASE_IMAGE build argument.

This is a user-space ROCm runtime image. It does NOT include kernel drivers. The host must provide compatible AMDGPU/ROCm kernel components and device access (e.g., --device=/dev/kfd --device=/dev/dri).

See the header comments in rocm_runtime.Dockerfile for supported base images, build arguments, and build/run examples.

Supporting scripts:

  • install_rocm_deps.sh: Auto-detects the distribution and installs ROCm runtime dependencies using the appropriate package manager (apt, dnf, or tdnf).

  • install_rocm_tarball.sh: Downloads ROCm tarball from rocm.{nightlies|prereleases|devreleases}.amd.com or repo.amd.com (for stable releases), extracts to /opt/rocm-{VERSION} with /opt/rocm symlink. Can also be used standalone on a Linux host (prints environment variable setup instructions after installation):

    # One-liner installation
    curl -sSL https://raw.githubusercontent.com/ROCm/TheRock/main/dockerfiles/install_rocm_tarball.sh | \
      sudo bash -s -- <VERSION> <AMDGPU_FAMILY> [RELEASE_TYPE]
    
    # Example: Install ROCm 7.11.0a20251211 for gfx110X
    curl -sSL https://raw.githubusercontent.com/ROCm/TheRock/main/dockerfiles/install_rocm_tarball.sh | \
      sudo bash -s -- 7.11.0a20260108 gfx94X-dcgpu nightlies

Using published images

Images for these Dockerfiles are published as GitHub Packages to the GitHub Container Registry (ghcr.io). The full list of published packages can be viewed at https://github.com/orgs/ROCm/packages?repo_name=TheRock.

Stable tags for these packages are typically latest and main, while other tags may be used for test packages.

Working on the Dockerfiles themselves

Testing and debugging

Automated image publishing

Images are automatically built and published to GitHub Packages (ghcr.io) by GitHub Actions after pushes to the associated .Dockerfile on the main or stage/docker/** branches. These actions can also be triggered manually from other branches using workflow_dispatch.

The common .github/workflows/publish_dockerfile.yml workflow is used by other publish_*.yml workflows:

Workflow file Workflow run history
.github/workflows/publish_build_manylinux_x86_64.yml actions/workflows/publish_build_manylinux_x86_64.yml
.github/workflows/publish_build_manylinux_rccl_x86_64.yml actions/workflows/publish_build_manylinux_rccl_x86_64.yml
.github/workflows/publish_no_rocm_image_ubuntu24_04.yml actions/workflows/publish_no_rocm_image_ubuntu24_04.yml

Tags for built docker images are set based on the branch name pattern:

Branch name pattern Tag pattern
main latest
stage/docker/SUFFIX stage-SUFFIX
OTHER_NAME OTHER_NAME

Updating images used by GitHub Actions workflows

The general sequence for updating an image is this:

  1. Make any changes to the relevant Dockerfiles and/or scripts
  2. Build and test images locally (see Testing and debugging)
  3. Build and test images on CI
    1. Push the Dockerfile changes to a shared branch and build test packages (see Automated image publishing)

    2. Edit sha256 pins across the project to use the staging packages, like so:

          container:
      -     image: ghcr.io/rocm/therock_build_manylinux_x86_64@sha256:4af52d56d91ef6ef8b7d0a13c6115af1ab2c9bf4a8a85d9267b489ecb737ed25
      +     image: ghcr.io/rocm/therock_build_manylinux_x86_64@sha256:6e8242d347af7e0c43c82d5031a3ac67b669f24898ea8dc2f1d5b7e4798b66bd
    3. Trigger test CI/CD jobs (e.g. by creating a draft PR with those changes)

  4. Create a pull request containing the Dockerfile changes and go through code review
  5. Wait for new packages to be published
  6. Create a second pull request editting the sha256 pins across the project to use the new packages (from the latest tag)