Skip to content
This repository was archived by the owner on Dec 3, 2025. It is now read-only.
This repository was archived by the owner on Dec 3, 2025. It is now read-only.

LoRA fine-tuned model produces identical inference results to base model + checkpoint loading warnings #176

@WillKang-1204

Description

@WillKang-1204

Environment:

  • Model: nvidia/Cosmos-Predict2-2B-Video2World
  • Training: LoRA (rank=16, alpha=16)
  • Hardware: 2x GPUs with FSDP
  • Dataset: 100 robot manipulation videos (77 frames, 720p @ 16fps)

Issue Description

After successfully completing LoRA post-training for 500 iterations, the fine-tuned model generates identical outputs to the base model during inference. Additionally, I observe warnings during checkpoint loading that suggest potential issues with LoRA weight application.

Training Setup

Training completed successfully:

export EXP=predict2_video2world_lora_training_2b_cosmos_nemo_assets

torchrun --nproc_per_node=2 --master_port=12341 \
    -m scripts.train \
    --config=cosmos_predict2/configs/base/config.py -- \
    experiment=${EXP} \
    job.name=77f_test_lora \
    model.config.fsdp_shard_size=2 \
    model.config.train_architecture=lora \
    dataloader_train.dataset.num_frames=77 \
    trainer.max_iter=1000 \
    checkpoint.save_iter=500

Training configuration (from config.yaml):

model:
  train_architecture: lora
  lora_rank: 16
  lora_alpha: 16
  lora_target_modules: q_proj,k_proj,v_proj,output_proj,mlp.layer1,mlp.layer2
  fsdp_shard_size: 2

optimizer:
  lr: 4.315837287515549e-05  # 2**(-14.5)
  type: fusedadamw

scheduler:
  type: lambdalinear
  warm_up_steps: [0]
  cycle_lengths: [1000]
  f_max: [0.6]

trainer:
  max_iter: 1000
  batch_size: 1

LoRA injection confirmed during training:

[INFO] LoRA injection successful: 22,937,600 trainable parameters out of 1,979,351,040 total (1.159%)
[INFO] LoRA parameter breakdown:
   lora_A: 11,010,048 parameters
   lora_B: 11,927,552 parameters
   Total LoRA: 22,937,600 parameters

Dataset:

  • 100 robot arm manipulation videos
  • All videos: exactly 77 frames (verified with ffprobe)
  • Resolution: 720p (1280x704) @ 16fps
  • Content: First-person view robot manipulation tasks
  • All videos have corresponding text prompts

Problem 1: Identical Inference Results

Inference with base model:

python scripts/hf_video2world_lora.py \
    /workspace/dream-outputs/test_ori_model \
    --prompt dream-datasets/prompts/test_prompt.txt \
    --image dream-datasets/picture/test_robot_1.jpg \
    --model nvidia/Cosmos-Predict2-2B-Video2World \
    --height 720 --width 1280 \
    --fps 16 --frames 77 --steps 35 -v

Inference with LoRA fine-tuned model:

python scripts/hf_video2world_lora.py \
    /workspace/dream-outputs/test_model \
    --prompt dream-datasets/prompts/test_prompt.txt \
    --image dream-datasets/picture/test_robot_1.jpg \
    --model nvidia/Cosmos-Predict2-2B-Video2World \
    --lora_checkpoint checkpoints/posttraining/video2world_lora/77f_test_lora/checkpoints/model/iter_000000500.pt \
    --height 720 --width 1280 \
    --fps 16 --frames 77 --steps 35 -v

Result: Both outputs are visually identical (frame-by-frame comparison shows no differences).


Problem 2: Checkpoint Loading Warnings

During LoRA checkpoint loading in inference, I see these warnings:

Loading LoRA checkpoint: checkpoints/.../iter_000000500.pt
✅ Found 1120 LoRA parameters
   Missing keys: 567
   Unexpected keys: 1120
✅ LoRA weights loaded! Scale: 1.0

Questions:

  • What do "Missing keys: 567" and "Unexpected keys: 1120" indicate?
  • Does this mean the LoRA weights are not being properly applied to the model?
  • Is this why inference results are identical to the base model?

Additional Observation: Training Loss Behavior

During training, the loss shows unusual patterns:

Iter 1-8:   Loss oscillates between 0.6-0.8 (normal range)
Iter 9:     Loss: 4.1003  ← sudden spike!
Iter 10+:   Loss returns to 0.6-0.8
...
Iter 310:   Loss: 2.2639  ← another spike
...
Iter 500:   Loss: 0.5779

The loss does not show a clear downward trend and has occasional large spikes. Not sure if this is related to the inference issue.


Questions

  1. Are the checkpoint loading warnings ("Missing keys", "Unexpected keys") preventing LoRA weights from being applied during inference?

  2. How can I verify that LoRA weights are actually being used during inference?

    • Is there a way to check if the model behavior has changed?
  3. Is it expected for LoRA fine-tuning to produce visible differences with only 500-1000 iterations?

    • Should I train for longer?
  4. Are there any known issues with LoRA training on Cosmos-Predict2-2B-Video2World?

    • Should I consider full fine-tuning instead?

What I've Verified

✅ All 100 training videos are exactly 77 frames (verified with ffprobe)
✅ Training completes without errors
✅ LoRA parameters are injected successfully (22.9M trainable params)
✅ Checkpoint files are saved correctly
✅ Using the same prompt and image for both base and fine-tuned inference


Expected Behavior

After LoRA post-training, I expect the fine-tuned model to generate outputs that differ from the base model and are more aligned with my robot manipulation training data.


Request

Could you help clarify:

  1. Whether the checkpoint loading warnings indicate a problem
  2. If there are additional steps needed to properly load/apply LoRA weights during inference
  3. Recommended training parameters or best practices for LoRA on Video2World

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions