Skip to content

device meta is invalid when using group offload #1786

@bghira

Description

@bghira
[RANK 0] 2025-10-21 13:15:19,710 [INFO] (simpletuner.helpers.models.common) Moving AutoencoderKLWan to accelerator, converting from torch.float32 to torch.bfloat16
[RANK 0] 2025-10-21 13:15:21,156 [ERROR] (validation) Error generating validation image: device meta is invalid, Traceback (most recent call last):
  File "/notebooks/SimpleTuner/simpletuner/helpers/training/validation.py", line 1675, in validate_prompt
    pipeline_result = self.model.pipeline(**pipeline_kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/notebooks/SimpleTuner/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/notebooks/SimpleTuner/simpletuner/helpers/models/wan/pipeline.py", line 1007, in __call__
    return self._call_text_to_video(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/notebooks/SimpleTuner/simpletuner/helpers/models/wan/pipeline.py", line 643, in _call_text_to_video
    noise_pred_uncond = self.transformer(
                        ^^^^^^^^^^^^^^^^^
  File "/notebooks/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/notebooks/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/notebooks/SimpleTuner/.venv/lib/python3.11/site-packages/diffusers/hooks/hooks.py", line 188, in new_forward
    args, kwargs = function_reference.pre_forward(module, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/notebooks/SimpleTuner/.venv/lib/python3.11/site-packages/diffusers/hooks/group_offloading.py", line 304, in pre_forward
    self.group.onload_()
  File "/notebooks/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/notebooks/SimpleTuner/.venv/lib/python3.11/site-packages/diffusers/hooks/group_offloading.py", line 260, in onload_
    self._onload_from_disk()
  File "/notebooks/SimpleTuner/.venv/lib/python3.11/site-packages/diffusers/hooks/group_offloading.py", line 189, in _onload_from_disk
    loaded_tensors = safetensors.torch.load_file(self.safetensors_file_path, device=device)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/notebooks/SimpleTuner/.venv/lib/python3.11/site-packages/safetensors/torch.py", line 381, in load_file
    with safe_open(filename, framework="pt", device=device) as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
safetensors_rust.SafetensorError: device meta is invalid

encountered this bad boy when trying to use group offload with wan 2.1 14b t2v.

so my understanding of this, having hit it in HiDream before especially:

  • the pipeline gets created with missing denoiser module (such as, for creating text embeds)
  • the transformer is attached to the pipeline when it's loaded
  • the pipeline still thinks device should be meta instead of the actual accelerator

and in hidream, 9d2cac2 fixed it by just overriding device = self.transformer.device

🤔 so maybe the same fix will work here. but it'd be nicer to understand the root issue. i've traced through it before, but not really looking forward to doing so again.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingupstream-bugWe can't do anything but wait.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions