We support the following Cosmos Autoregressive models for post-training. Review the available models and their compute requirements for post-training and inference to determine the best model for your use case.
| Model Name | Model Status | Compute Requirements for Post-Training |
|---|---|---|
| Cosmos-Predict1-4B | Supported | 1 NVIDIA GPUs* |
* H100-80GB or A100-80GB GPUs are recommended.
Please refer to the Post-training section of INSTALL.md for instructions on environment setup.
-
Generate a Hugging Face access token (if you haven't done so already). Set the access token to
Readpermission (default isFine-grained). -
Log in to Hugging Face with the access token:
huggingface-cli login
-
Accept the Llama-Guard-3-8B terms
-
Download the Cosmos model weights from Hugging Face:
CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python scripts/download_autoregressive_checkpoints.py --model_sizes 4B --checkpoint_dir checkpoints
-
For tensor parallel training, checkpoints need to be sharded to the target tensor model parallel size TP. Shard checkpoints to TP=4 with:
python scripts/shard_autoregressive_base_checkpoints.py --checkpoint_path checkpoints/Cosmos-Predict1-4B/model.pt --model_size 4b --tensor_parallel_size 4
This command will shard and save 4 TP checkpoint shards as
checkpoints/Cosmos-Predict1-4B/model_model_mp_{rank}.pt
Post-training a Cosmos Autoregressive WFM enables you to train the model to generate videos that are more specific to your use case.
There are 2 steps to post-training: downloading a dataset and post-training the model.
The first step is to download a dataset with videos. You must provide a folder containing a collection of videos in MP4 format.
You can use nvidia/Cosmos-NeMo-Assets for post-training.
mkdir -p datasets/cosmos_nemo_assets/
# This command will download the videos for physical AI
huggingface-cli download nvidia/Cosmos-NeMo-Assets --repo-type dataset --local-dir datasets/cosmos_nemo_assets/ --include "*.mp4*"
mv datasets/cosmos_nemo_assets/nemo_diffusion_example_data datasets/cosmos_nemo_assets/videosRun the following command to execute an example post-training job with the above data which is scaled to a lower resolution by the dataloader to fit on a single GPU
export OUTPUT_ROOT=checkpoints # default value
torchrun --nproc_per_node=1 -m cosmos_predict1.autoregressive.train --config=cosmos_predict1/autoregressive/configs/config.py -- experiment=base_4b_example_tealrobotsmall_tp1The model will be post-trained using the above cosmos_nemo_assets dataset.
See the VideoDataset defined in cosmos_predict1/autoregressive/datasets/video_dataset.py and register_training_data in cosmos_predict1/autoregressive/configs/registry.py to understand how the dataloader works and is registered.
The checkpoints will be saved to ${OUTPUT_ROOT}/PROJECT/GROUP/NAME. In the above example,
PROJECT is posttraining, GROUP is autoregressive_base, NAME is base_4b_example_tealrobotsmall_tp1.
See the job config to understand how they are determined.
base_4b_example_tealrobotsmall_tp1= LazyDict(
dict(
...
job=dict(
project="posttraining",
group="autoregressive_base",
name="base_4b_example_tealrobotsmall_tp1",
),
...
)
)During the training, the checkpoints will be saved in the below structure.
checkpoints/posttraining/autoregressive_base/base_4b_example_tealrobotsmall_tp1/checkpoints/
├── iter_{NUMBER}.pt
The model can also be post-trained on multiple GPUs using tensor parallelism. Run the following command to execute an example post-training job with the above data with higher resolution.
export OUTPUT_ROOT=checkpoints # default value
torchrun --nproc_per_node=4 -m cosmos_predict1.autoregressive.train --config=cosmos_predict1/autoregressive/configs/config.py -- experiment=base_4b_example_tealrobot_tp4The checkpoints will be saved to ${OUTPUT_ROOT}/PROJECT/GROUP/NAME. In the above example,
PROJECT is posttraining, GROUP is autoregressive_base, NAME is base_4b_example_tealrobot_tp4.
See the job config to understand how they are determined.
base_4b_example_tealrobotsmall_tp4= LazyDict(
dict(
...
job=dict(
project="posttraining",
group="autoregressive_base",
name="base_4b_example_tealrobotsmall_tp4",
),
...
)
)During the training, the sharded checkpoints will be saved in the below structure.
checkpoints/posttraining/autoregressive_base/base_4b_example_tealrobot_tp4/checkpoints/
├── iter_{NUMBER}.pt
├── iter_{NUMBER}_model_mp_0.pt
├── iter_{NUMBER}_model_mp_1.pt
├── iter_{NUMBER}_model_mp_2.pt
├── iter_{NUMBER}_model_mp_3.pt
The inference can be done with the same interface as described in examples/inference_autoregressive_base.md.
The post-trained checkpoint needs to be copied to checkpoints/Cosmos-Predict1-4B-Base_post-trained/model.pt
For example, with TP=1 if a posttrained checkpoint with 1000 iterations is to be used,
# copy checkpoint to the designated location
mkdir checkpoints/Cosmos-Predict1-4B-Base_post-trained/
cp checkpoints/posttraining/autoregressive_base/base_4b_example_tealrobotsmall_tp1/checkpoints/iter_000001000.pt checkpoints/Cosmos-Predict1-4B-Base_post-trained/model.ptWith TP=4, the postrained checkpoints are sharded and should first be merged into a single checkpoint for inference
# merge tensor parallel model shards
mkdir checkpoints/Cosmos-Predict1-4B-Base_post-trained/
python scripts/merge_autoregressive_tp_checkpoints.py --checkpoint_path checkpoints/posttraining/autoregressive_base/base_4b_example_tealrobot_tp4/checkpoints/iter_000001000.pt --output_path checkpoints/Cosmos-Predict1-4B-Base_post-trained/model.pt --model_size 4b --tensor_parallel_size 4This is the basic example for running inference on the post-trained 4B model with a single video.
NUM_GPUS=<NUM_GPUS>
CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) torchrun --nproc_per_node=${NUM_GPUS} cosmos_predict1/autoregressive/inference/base.py \
--num_gpus ${NUM_GPUS} \
--checkpoint_dir checkpoints \
--ar_model_dir Cosmos-Predict1-4B-Base_post-trained \
--input_type video \
--input_image_or_video_path datasets/cosmos_nemo_assets/videos/output_oige_render_view_sub.mp4 \
--top_p 0.8 \
--temperature 1.0 \
--offload_diffusion_decoder \
--offload_tokenizer \
--video_save_name autoregressive-4b-post-train