Accelerate model loading on GPU by 1.24x #251

atakan-topaloglu · 2025-07-09T12:01:01Z

PR: Accelerate model loading by 1.24x

Summary

This PR optimizes the model loading process, resulting in a ~1.24x speedup on CUDA-enabled devices. The change is backward compatible.

Previously the model's weights were first loaded into CPU RAM and then copied in a large batch to the GPU via model.to(device).
Now, an empty model "scaffold" is created directly in GPU VRAM. PyTorch reads the weights from disk and loads them directly into GPU VRAM.

# Old method
model = VGGT()
model.load_state_dict(torch.hub.load_state_dict_from_url(url))
model = model.to(device)

# New method
model = VGGT().to(device)
model.load_state_dict(torch.hub.load_state_dict_from_url(url, map_location=device))

You can access the benchmarking script and results in vggt_benchmark_loading.zip.

Thank you for the great work.

Accelerate model loading on GPU by 1.24x

eb217a4

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Accelerate model loading on GPU by 1.24x #251

Accelerate model loading on GPU by 1.24x #251

Uh oh!

atakan-topaloglu commented Jul 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Accelerate model loading on GPU by 1.24x #251

Are you sure you want to change the base?

Accelerate model loading on GPU by 1.24x #251

Uh oh!

Conversation

atakan-topaloglu commented Jul 9, 2025

PR: Accelerate model loading by 1.24x

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants