Skip to content

Conversation

@atakan-topaloglu
Copy link
Contributor

PR: Accelerate model loading by 1.24x

Summary

This PR optimizes the model loading process, resulting in a ~1.24x speedup on CUDA-enabled devices. The change is backward compatible.

Previously the model's weights were first loaded into CPU RAM and then copied in a large batch to the GPU via model.to(device).
Now, an empty model "scaffold" is created directly in GPU VRAM. PyTorch reads the weights from disk and loads them directly into GPU VRAM.

# Old method
model = VGGT()
model.load_state_dict(torch.hub.load_state_dict_from_url(url))
model = model.to(device)

# New method
model = VGGT().to(device)
model.load_state_dict(torch.hub.load_state_dict_from_url(url, map_location=device)) 

You can access the benchmarking script and results in vggt_benchmark_loading.zip.

Thank you for the great work.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants