Skip to content

[Doc] Add guides for custom docker image build on NVIDIA CUDA [Skip-CI]#1386

Open
loveysuby wants to merge 7 commits intovllm-project:mainfrom
loveysuby:docs/add-custom-docker-build-on-nvidia-cuda
Open

[Doc] Add guides for custom docker image build on NVIDIA CUDA [Skip-CI]#1386
loveysuby wants to merge 7 commits intovllm-project:mainfrom
loveysuby:docs/add-custom-docker-build-on-nvidia-cuda

Conversation

@loveysuby
Copy link
Copy Markdown
Contributor

@loveysuby loveysuby commented Feb 16, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Added NVIDIA CUDA build instructions to match the existing AMD ROCm guide.
Documents how to use docker/Dockerfile.cuda for custom builds, enabling source modifications and BASE_IMAGE customization. (added in #1439)

Test Plan

Runtime Environment: NVIDIA A100-SXM4-80GB (CUDA 13.0 / Driver 580.82.07)

  • verify docker build --check -f docker/Dockerfile.cuda with different BASE_IMAGE to specify vLLM base image.
DOCKER_BUILDKIT=1 docker build \
  --check \
  -f docker/Dockerfile.cuda \
  --build-arg BASE_IMAGE=vllm/vllm-openai:v0.18.0 \
  -t vllm-omni-cuda .

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please providing the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please pasting the results comparison before and after, or e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Hyoseop Song <crad_on25@naver.com>
Comment on lines +99 to +108
You can use this docker image to serve models the same way you would with in vLLM! To do so, make sure you overwrite the default entrypoint (`vllm serve --omni`) which works only for models supported in the vLLM-Omni project.

# --8<-- [end:pre-built-images]

# --8<-- [start:build-docker]

#### Build docker image

```bash
DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile.ci -t vllm-omni-cuda .
Copy link
Copy Markdown
Contributor Author

@loveysuby loveysuby Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@congw729 Hi, I've written a guide for NVIDIA GPU users, but using the Dockerfile.ci as-is doesn't seem suitable for the purpose.

I have already verified the installation logic on an NVIDIA A100. Should I create a new, dedicated Dockerfile for users and re-test it? Let me know your thoughts, and I'll update the PR accordingly.

Switching to Draft for now.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@congw729 Hi, I've written a guide for NVIDIA GPU users, but using the Dockerfile.ci as-is doesn't seem suitable for the purpose.

I have already verified the installation logic on an NVIDIA A100. Should I create a new, dedicated Dockerfile for users and re-test it? Let me know your thoughts, and I'll update the PR accordingly.

Switching to Draft for now.

I think it's better to using a different Dockerfile. Docker.ci will install unnecessary packages for users.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the Dockerfile.ci install the vllm-omni in dev mode, which will include some unnecessary packages.

@loveysuby loveysuby marked this pull request as draft February 16, 2026 15:18
@loveysuby loveysuby marked this pull request as ready for review February 17, 2026 11:59
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things worth discussing in the CUDA build guide.

Comment thread docs/getting_started/installation/gpu/cuda.inc.md
Comment thread docs/getting_started/installation/gpu/cuda.inc.md
Comment thread docs/getting_started/installation/gpu/cuda.inc.md

```bash
docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This model needs significant GPU memory ("verified on 2 x H100s" above). Worth noting that or using --gpus 2 in the example.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@vllm-omni-reviewer

@github-actions
Copy link
Copy Markdown

🤖 VLLM-Omni PR Review

Code Review: Add guides for custom docker image build on NVIDIA CUDA

1. Overview

This PR adds documentation for building custom Docker images on NVIDIA CUDA, mirroring the existing AMD ROCm guide structure. The changes include:

  • A new tab entry in gpu.md for NVIDIA CUDA build instructions
  • A new build-docker section in cuda.inc.md with build and launch commands

Overall Assessment: Positive - The PR follows the existing documentation patterns and provides useful guidance for users who need custom Docker builds.

2. Code Quality

Strengths

  • Follows the existing documentation structure and include pattern (--8<--)
  • Provides both server and interactive launch modes
  • Shows how to customize the base vLLM version with VLLM_BASE_TAG
  • Uses DOCKER_BUILDKIT=1 for modern build behavior

Minor Issues

  1. Version inconsistency between PR description and documentation:

    • PR description tests with VLLM_BASE_TAG=v0.11.0
    • Documentation example shows VLLM_BASE_TAG=v0.15.0

    Consider aligning these or adding a note about available versions.

  2. Missing --rm flag for interactive container:

    • docs/getting_started/installation/gpu/cuda.inc.md:134
    • Adding --rm would prevent leftover containers after exiting interactive sessions

3. Architecture & Design

  • Good: Follows the established documentation pattern with include files and tab structure
  • Good: Maintains consistency with the ROCm documentation approach
  • Good: Uses the same docker/Dockerfile.ci referenced in the PR description

4. Security & Safety

  • Acceptable: HF_TOKEN is passed via --env which is standard practice
  • Standard: Volume mount for HuggingFace cache follows common patterns
  • Note: Users should be aware that HF_TOKEN will be visible in process listing; this is a known Docker limitation and acceptable for this use case

5. Testing & Documentation

Test Plan

  • ✅ PR includes test commands and environment details
  • ✅ Screenshots provided showing successful build verification
  • ✅ Uses docker build --check for validation

Documentation Completeness

  • ✅ Build instructions are clear
  • ✅ Launch instructions cover both server and interactive modes
  • ✅ Follows existing documentation style

Suggestion

Consider adding a brief note about what modifications users might want to make when building custom images (e.g., "Modify the source code before building to include custom changes").

6. Specific Suggestions

docs/getting_started/installation/gpu/cuda.inc.md

Line 113-117: Consider adding a comment about when to use custom builds:

# Use this when you need to modify vLLM-Omni source code or use a specific vLLM version
DOCKER_BUILDKIT=1 docker build \
  -f docker/Dockerfile.ci \
  --build-arg VLLM_BASE_TAG=v0.15.0 \
  -t vllm-omni-cuda .

Line 134: Add --rm flag for cleaner interactive session management:

docker run --runtime nvidia --gpus all -it --rm \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --env "HF_TOKEN=$HF_TOKEN" \
  -p 8091:8091 \
  --ipc=host \
  --entrypoint bash \
  vllm-omni-cuda

Line 125: The model name Qwen/Qwen3-Omni-30B-A3B-Instruct - verify this matches the current recommended model in other documentation sections for consistency.

7. Approval Status

LGTM with suggestions

The PR is well-structured and ready to merge. The suggestions above are minor improvements:

  • Optional: Add --rm flag to interactive session example
  • Optional: Align version numbers between PR description and documentation, or clarify that v0.15.0 is an example
  • Optional: Add a brief note about use cases for custom builds

These are non-blocking suggestions that could be addressed in a follow-up PR if preferred. The documentation is clear, follows established patterns, and provides valuable guidance for users.


This review was generated automatically by the VLLM-Omni PR Reviewer Bot
using glm-5.

@loveysuby
Copy link
Copy Markdown
Contributor Author

@lishunyang12 @congw729 Thanks for the review feedback.

I've created Dockerfile.cuda in #1439 based on your suggestions.

Once #1439 is merged, I'll update this documentation PR:

  • Use Dockerfile.cuda instead of Dockerfile.ci
  • Change diverse version example to v0.14.0 (instead of default v0.15.0)
  • Add --gpus 2 and GPU memory note
  • Add --rm flag for interactive sessions

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@vllm-omni-reviewer

@Gaohan123
Copy link
Copy Markdown
Collaborator

Hello, any updates? Currently v0.16.0 has already released

@loveysuby
Copy link
Copy Markdown
Contributor Author

@Gaohan123 I sent you a message on the vLLM Slack about this updates and #1439. Please take a look (cc: @tzhouam)

@loveysuby
Copy link
Copy Markdown
Contributor Author

@Gaohan123 @lishunyang12 PTAL:
I revised docs by using dockerfile.cuda instead of ci-only build. (Dockerfile.cuda is merged main on #1439 based on this pr suggested.)

There was an image build test in the PR body, but since this had already been verified in #1439, I removed it. Please let me know if you have any requests for changes to the document content.

@tzhouam tzhouam changed the title [Doc] Add guides for custom docker image build on NVIDIA CUDA [Doc] Add guides for custom docker image build on NVIDIA CUDA [Skip-CI] Apr 10, 2026
Comment thread docs/getting_started/installation/gpu/cuda.inc.md
Comment on lines +99 to +108
You can use this docker image to serve models the same way you would with in vLLM! To do so, make sure you overwrite the default entrypoint (`vllm serve --omni`) which works only for models supported in the vLLM-Omni project.

# --8<-- [end:pre-built-images]

# --8<-- [start:build-docker]

#### Build docker image

```bash
DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile.ci -t vllm-omni-cuda .
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the Dockerfile.ci install the vllm-omni in dev mode, which will include some unnecessary packages.

```bash
DOCKER_BUILDKIT=1 docker build \
-f docker/Dockerfile.cuda \
--build-arg BASE_IMAGE=vllm/vllm-openai:v0.18.0 \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be 0.19.0 now

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised in f74ba81.

Signed-off-by: Hyoseop Song <crad_on25@naver.com>

Signed-off-by: Hyoseop Song  <crad_on25@naver.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants