Support rope cache indexing using positions #2112

acisseJZhong · 2025-12-05T01:07:12Z

Add support to indexing rope cache using position_ids, this might be needed during

inference, where we passed in position_ids into transformer forward
CP load balancing where we need to index rope cache given positions ids

Test:
running dpskv3 16b base

also tested in https://github.com/wwwjn/torchtitan/pull/1/files when passing position_ids

tianyu-l

Thanks! May I ask: for the dsv3 16B test on dp4tp2 before vs. after, did you explicitly pass positions into the model? We should try both (1, seq_len) and (batch_size, seq_len) inputs (they could be the trivial 0 -> seq_len - 1 ids).

Also had an inline comment.

tianyu-l · 2025-12-07T10:37:54Z

torchtitan/models/deepseek_v3/infra/parallelize.py

            "attention": prepare_module_input(
-                input_layouts=(Shard(1), Replicate(), None),
-                desired_input_layouts=(Replicate(), Replicate(), None),
+                input_layouts=(Shard(1), Replicate(), None, None),


Note that when positions is not None, this is making implicit assumption that positions has the the expected sharding when it's used, namely
sharded on batch dim by DP, replicate on TP mesh, sharded on seq dim by CP

I don't have a good solution right now -- making it Replicate by default will fail here when positions is None https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/parallel/style.py#L521
but clearly this is leaving a footgun. I'd suggest we add a comment for now.

yeah when testing positions [1, seq_len] and [bz, seqlen] in dp4tp2, I need to manually change both layouts to Replicate(). But for the default case it should be None.

For CP, I think we need to manually change https://fburl.com/v2rn2s48.
For FSDP, not sure how the sharding info is specified today but looks like it's already handled?

will add a comment for now

acisseJZhong · 2025-12-07T23:04:31Z

We should try both (1, seq_len) and (batch_size, seq_len) inputs (they could be the trivial 0 -> seq_len - 1 ids).

updated the loss graph, I also tested both cases with vllm inference as well, and the text output is the same.

tianyu-l

LGTM! Thank you!

add pos id indexing rope cache

9ebe5a5

acisseJZhong requested review from fegin, tianyu-l, wconstab and wwwjn as code owners December 5, 2025 01:07

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 5, 2025

acisseJZhong closed this Dec 5, 2025

support 2d pos id

bc967ea

acisseJZhong reopened this Dec 5, 2025

jessicazhongeee added 7 commits December 5, 2025 14:59

run formatting

65cdca9

draft changes for other model

36f267d

fix

ed12667

running

e2e6b5d

change parallelize

aab7de5

formatting

487a5ab

change configs

e5d2ba1

tianyu-l reviewed Dec 7, 2025

View reviewed changes

add comments

97f1715

tianyu-l approved these changes Dec 9, 2025

View reviewed changes

acisseJZhong merged commit f1d41a1 into pytorch:main Dec 9, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support rope cache indexing using positions #2112

Support rope cache indexing using positions #2112

Uh oh!

acisseJZhong commented Dec 5, 2025 •

edited

Loading

Uh oh!

tianyu-l left a comment

Uh oh!

tianyu-l Dec 7, 2025

Uh oh!

acisseJZhong Dec 7, 2025 •

edited

Loading

Uh oh!

acisseJZhong commented Dec 7, 2025 •

edited

Loading

Uh oh!

tianyu-l left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Support rope cache indexing using positions #2112

Support rope cache indexing using positions #2112

Uh oh!

Conversation

acisseJZhong commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

tianyu-l Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

acisseJZhong Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acisseJZhong commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

acisseJZhong commented Dec 5, 2025 •

edited

Loading

acisseJZhong Dec 7, 2025 •

edited

Loading

acisseJZhong commented Dec 7, 2025 •

edited

Loading