ComfyUI Hunyuan Image 3.0 custom node

This is a custom node that allows for basic image generation using Hunyuan Image 3.0.

Features

Supports CPU and disk offload to allow generation on consumer setups
- When using CPU offload, weights are stored in system RAM and transferred to the GPU as needed for processing
- When using disk offload, weights are stored in system RAM and on disk and transferred to the GPU as needed for processing

System Requirements

Tested (uses CPU + disk offload):

Windows¹
An Nvidia GPU with 24GB of VRAM¹
128 GB of system memory²
A PCIe 4.0 x16 connection between GPU and CPU³
An NVMe SSD⁴

Recommended (uses CPU offload, but no disk offload)⁵:

Windows
An Nvidia GPU with >32GB of VRAM
192GB of system memory
A PCIe 5.0 x16 connection between GPU and CPU

Ultra (does not use offload)⁵:

Enough VRAM to run entirely on GPU (possibly 192GB of VRAM)

Installation

Navigate to your custom_nodes folder
Clone this repo: git clone https://github.com/bgreene2/ComfyUI-HunyuanImage3.git
Change to the directory cd ComfyUI-HunyuanImage3
Assuming the correct Python environemnt is loaded, install dependencies pip install -r requirements.txt
If pytorch is not already installed, install pytorch: pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128
Add the flag --disable-cuda-malloc to your ComfyUI startup script.

Post-installation steps

Download the model weights
If you are using an Nvidia GPU, in the Nvidia control panel, change the "CUDA Sysmem Fallback Policy" to "Prefer Sysmem Fallback"

Usage Guide

The node has the following inputs:

prompt - Your text prompt

The node has the following parameters:

seed - The seed for the random number generator
control after generate - This is attached to the 'seed' parameter
use_dimensions - Whether to use the below user-specified width and height values. If false, the model will automatically choose the image dimensions.
width - Image width
height - Image height
steps - Number of steps. The recommended value is 50.
guidance_scale - CFG. The recommended value is 7.5.
attn_implementation - Valid values are sdpa and flash_attention_2
moe_impl - Valid values are eager and flashinfer
weights_folder - The path to the model's weights on disk. Note: the path can't have a dot in it.
use_offload - Whether to use CPU / disk offload.
disk_offload_layers - The number of layers (out of 32) to offload to disk, rather than hold in memory. See the Performance Tuning section for more details.

Basic usage: Connect a String (Multiline) input to the prompt input, and connect the Image output to a Save Image node. An example workflow is provided.

Performance Tuning

If you are using disk offload, you need to choose the number of layers to offload such that your RAM is not completely filled, so that the system does not use swap. This may require some trial-and-error while monitoring memory usage. On a system with 128GB of memory, 10 layers is a good starting point.

With an RTX 4090, 128GB DDR4, PCIe 4.0, an image will take on the order of 1-1.5 hours to generate.

Recommended Usage

This model works best with detailed prompts. See the HuggingFace page for a prompting guide. You can also use an LLM to rewrite your prompts, such as PromptEnhancer.

Troubleshooting

If you are getting crashes due to running out of GPU memory, make sure you are doing the following:

Use Windows
Use an Nvidia GPU
Set Sysmem Fallback Policy in the Nvidia control panel
Run ComfyUI with --disable-cuda-malloc

Windows + Nvidia is required when using a GPU with low VRAM, because this combination allows for the driver to fallback to using system memory if graphics memory is full. Otherwise some operations will cause the program to crash with OOM. Graphics memory usage appears to spike above 32GB of VRAM, so system memory fallback is probably still necessary with an RTX 5090. ↩ ↩²
The less memory you have, the more weights will have to be offloaded to disk, with a dramatic performance penalty. ↩
When using CPU offload, a major performance bottleneck is in copying weights across the PCIe bus to GPU. ↩
When using disk offload, a major bottleneck is in reading from drive, so you want it to be as fast as possible. ↩
This configuration has not been tested. ↩ ↩²

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
workflows		workflows
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
hunyuan_image_3_gen.py		hunyuan_image_3_gen.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ComfyUI Hunyuan Image 3.0 custom node

Features

System Requirements

Installation

Post-installation steps

Usage Guide

Performance Tuning

Recommended Usage

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ComfyUI Hunyuan Image 3.0 custom node

Features

System Requirements

Installation

Post-installation steps

Usage Guide

Performance Tuning

Recommended Usage

Troubleshooting

Footnotes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages