Skip to content

timkhronos/ComfyUI-Hunyuan-Image-3-Reserved

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ComfyUI Hunyuan Image 3.0 custom node

This is a custom node that allows for basic image generation using Hunyuan Image 3.0.

Node screenshot

Features

  • Supports CPU and disk offload to allow generation on consumer setups
    • When using CPU offload, weights are stored in system RAM and transferred to the GPU as needed for processing
    • When using disk offload, weights are stored in system RAM and on disk and transferred to the GPU as needed for processing

System Requirements

Tested (uses CPU + disk offload):

  • Windows1
  • An Nvidia GPU with 24GB of VRAM1
  • 128 GB of system memory2
  • A PCIe 4.0 x16 connection between GPU and CPU3
  • An NVMe SSD4

Recommended (uses CPU offload, but no disk offload)5:

  • Windows
  • An Nvidia GPU with >32GB of VRAM
  • 192GB of system memory
  • A PCIe 5.0 x16 connection between GPU and CPU

Ultra (does not use offload)5:

  • Enough VRAM to run entirely on GPU (possibly 192GB of VRAM)

Installation

  1. Navigate to your custom_nodes folder
  2. Clone this repo: git clone https://github.com/bgreene2/ComfyUI-HunyuanImage3.git
  3. Change to the directory cd ComfyUI-HunyuanImage3
  4. Assuming the correct Python environemnt is loaded, install dependencies pip install -r requirements.txt
  5. If pytorch is not already installed, install pytorch: pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128
  6. Add the flag --disable-cuda-malloc to your ComfyUI startup script.

Post-installation steps

  • Download the model weights
  • If you are using an Nvidia GPU, in the Nvidia control panel, change the "CUDA Sysmem Fallback Policy" to "Prefer Sysmem Fallback"
    • CUDA Sysmem Fallback

Usage Guide

The node has the following inputs:

  • prompt - Your text prompt

The node has the following parameters:

  • seed - The seed for the random number generator
  • control after generate - This is attached to the 'seed' parameter
  • use_dimensions - Whether to use the below user-specified width and height values. If false, the model will automatically choose the image dimensions.
  • width - Image width
  • height - Image height
  • steps - Number of steps. The recommended value is 50.
  • guidance_scale - CFG. The recommended value is 7.5.
  • attn_implementation - Valid values are sdpa and flash_attention_2
  • moe_impl - Valid values are eager and flashinfer
  • weights_folder - The path to the model's weights on disk. Note: the path can't have a dot in it.
  • use_offload - Whether to use CPU / disk offload.
  • disk_offload_layers - The number of layers (out of 32) to offload to disk, rather than hold in memory. See the Performance Tuning section for more details.

Basic usage: Connect a String (Multiline) input to the prompt input, and connect the Image output to a Save Image node. An example workflow is provided. example workflow

Performance Tuning

If you are using disk offload, you need to choose the number of layers to offload such that your RAM is not completely filled, so that the system does not use swap. This may require some trial-and-error while monitoring memory usage. On a system with 128GB of memory, 10 layers is a good starting point.

With an RTX 4090, 128GB DDR4, PCIe 4.0, an image will take on the order of 1-1.5 hours to generate.

Recommended Usage

This model works best with detailed prompts. See the HuggingFace page for a prompting guide. You can also use an LLM to rewrite your prompts, such as PromptEnhancer.

Troubleshooting

If you are getting crashes due to running out of GPU memory, make sure you are doing the following:

  1. Use Windows
  2. Use an Nvidia GPU
  3. Set Sysmem Fallback Policy in the Nvidia control panel
  4. Run ComfyUI with --disable-cuda-malloc

Footnotes

  1. Windows + Nvidia is required when using a GPU with low VRAM, because this combination allows for the driver to fallback to using system memory if graphics memory is full. Otherwise some operations will cause the program to crash with OOM. Graphics memory usage appears to spike above 32GB of VRAM, so system memory fallback is probably still necessary with an RTX 5090. 2

  2. The less memory you have, the more weights will have to be offloaded to disk, with a dramatic performance penalty.

  3. When using CPU offload, a major performance bottleneck is in copying weights across the PCIe bus to GPU.

  4. When using disk offload, a major bottleneck is in reading from drive, so you want it to be as fast as possible.

  5. This configuration has not been tested. 2

About

A ComfyUI custom node that provides a basic implementation of Hunyuan Image 3.0 using Transformers., without unloading

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%