-
Notifications
You must be signed in to change notification settings - Fork 70
Sequence parallel welford #5552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
!test |
|
Review updated until commit 3276783 Description
|
| Relevant files | |||
|---|---|---|---|
| Enhancement |
| ||
| Documentation |
| ||
| Tests |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Significant logic simplification in multiOutputRFactorHelper
TransformReplay::selfReplay. This is a substantial change that could affect correctness, especially for complex multi-output reduction scenarios. Need to verify this simplification maintains the same behavior and doesn't break existing use cases. |
|
!test |
| "should not call clearReductionIterDomains on transformed allocation " | ||
| "domain"); | ||
|
|
||
| std::vector<IterDomain*> new_logical; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only use of this function is in fusion segmenter for TranslateApplicableWelford::translateSingleWelford. I'll move it there in a separate PR.
Greptile Summary
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant FusionDefinition
participant TensorView
participant TransformReplay
participant TensorDomain
User->>FusionDefinition: "define_tensor with sharding"
User->>FusionDefinition: "ops.var_mean (Welford)"
FusionDefinition->>TensorView: "multiOutputRFactorHelper"
TensorView->>TransformReplay: "selfReplay(this->domain, tv->domain)"
TransformReplay->>TensorDomain: "replay loop domain transformations"
TensorDomain-->>TransformReplay: "updated domain"
TransformReplay-->>TensorView: "replayed successfully"
TensorView->>TensorView: "rFactor for reduction"
TensorView->>TensorView: "clearReductionIterDomains"
TensorView->>TensorDomain: "noReductions(loop_domain)"
TensorView->>TensorDomain: "noReductions(allocation_domain)"
TensorView->>TensorDomain: "create new TensorDomain with preserved loop"
TensorDomain-->>TensorView: "domain without reduction IDs"
TensorView-->>User: "execute fusion on sharded tensor"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, no comments
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
csrc/tensor_view.cpp
Outdated
| std::unordered_map<IterDomain*, IterDomain*> id_map; | ||
| for (const auto i : arange(logical.size())) { | ||
| id_map[this_logical[i]] = logical[i]; | ||
| std::unordered_map<IterDomain*, IterDomain*> ref_to_target_map; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds just like TransformReplay::selfReplay. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 Yes. It is.
| NVF_ERROR( | ||
| !domain()->hasRoot(), | ||
| "should not call clearReductionIterDomains on rfactor tv"); | ||
| const std::vector<std::optional<bool>>& contiguity = getContiguity(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like another use case for selfReplay.
- Create a TensorDomain of logical domain
noReductions(getLogicalDomain()), loop empty, and allocation empty - selfReplay
this->domain()to that new TensorDomain - setDomain(the new)
Am I missing something? Do I need to merge #5316 for selfReplay to work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, selfReplay could be utilized here.
However, we are only stripping the reduction IDs, so generating the replay here is unnecessary.
|
!test --diff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, no comments
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
This PR modifies the following tensorview methods to support multidevice fusions:
clearReductionIterdomains: For multidevice fusions, the assumption of logical == loop and allocation being a permutation of logical is no longer true. The changes include updating these checks and explicitly modifying loop domain to avoid losing sharding. This function is called during segmentation when allocation is sharded as well.multiOutputRfactorHelper: Previous code assumed trivial allocation domain. Multidevice fusions have sharded allocation domains.