-
Notifications
You must be signed in to change notification settings - Fork 0
[MLIR][OpenMP] Updates to initial Taskloop Bounds Implementation #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: kaviya_taskloop
Are you sure you want to change the base?
Conversation
- Force the first 3 entries to the StructArg to be the bounds info - Ensure it will work when executing the tasks in parallel
tblah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me beyond some minor nits. I see O3 can optimize away the extra geps and stores from outlining so I think it is okay to leave them.
| PostOutlineCBTy PostOutlineCB; | ||
| BasicBlock *EntryBB, *ExitBB, *OuterAllocaBB; | ||
| SmallVector<Value *, 2> ExcludeArgsFromAggregate; | ||
| SetVector<Value *> Inputs, Outputs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I know the other members don't have documentation but I think it would be good to add something because I don't think the use of these is immediately obvious.
- I don't think we can use Outputs. IIRC there's an assertion somewhere that there are no live out values. Better to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Added a comment about the use of Inputs
- We still need some definition of Outputs somewhere as the CodeExtractor's
extractCodeRegionexpects there to be a SetVector for both Inputs and Outputs. The API gives 2 options, one where you just pass the CEAC value, and another that includes the inputs and outputs. I am happy to exclude the Outputs from the OutlineInfo struct, but there will need to be a SetVector made before extracting the code region from OpenMPIRBuilder::finalize.
| llvm::SmallVectorImpl<Instruction *> &ToBeDeleted, | ||
| OpenMPIRBuilder::InsertPointTy InnerAllocaIP, | ||
| const Twine &Name = "", bool AsPtr = true) { | ||
| const Twine &Name = "", bool AsPtr = true, bool Is64Bit = false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const Twine &Name = "", bool AsPtr = true, bool Is64Bit = false) { | |
| const Twine &Name = "", bool AsPtr = true, IntegerType *IntTy) { | |
| Builder.restoreIP(OuterAllocaIP); | |
| IntTy = IntTy ? IntTy : Builder.getInt32Ty(); |
More flexible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| } else { | ||
| UseFakeVal = | ||
| cast<BinaryOperator>(Builder.CreateAdd(FakeVal, Builder.getInt32(10))); | ||
| cast<BinaryOperator>(Builder.CreateAdd(FakeVal, Is64Bit ? Builder.getInt64(10) : Builder.getInt32(10))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do this from IntTy with llvm::ConstantInt::get or IRBuilderBase::getIntN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with ConstantInt::get
| OI.ExcludeArgsFromAggregate.push_back(createFakeIntVal( | ||
| Builder, AllocaIP, ToBeDeleted, TaskloopAllocaIP, "global.tid", false)); | ||
| Value *FakeLB = createFakeIntVal(Builder, AllocaIP, ToBeDeleted, TaskloopAllocaIP, | ||
| "lb", false, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "lb", false, true); | |
| "lb", /*AsPtr=*/false, Builder.getInt64Ty()); |
It isn't obvious what the bool values mean without some extra help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| "ub", false, true); | ||
| Value *FakeStep = createFakeIntVal(Builder, AllocaIP, ToBeDeleted, TaskloopAllocaIP, | ||
| "step", false, true); | ||
| /* For Taskloop, we want to force the bounds being the first 3 inputs in the aggregate struct*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: llvm style is to use C++ style comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| // HasShareds is true if any variables are captured in the outlined region, | ||
| // false otherwise. | ||
| bool HasShareds = StaleCI->arg_size() > 1; | ||
| /* Create the casting for the Bounds Values that can be used when outlining to replace the uses of the fakes with real values */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comment style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| /*sizeof_task=*/TaskSize, /*sizeof_shared=*/SharedsSize, | ||
| /*task_func=*/&OutlinedFn}); | ||
|
|
||
| Value *Shareds = StaleCI->getArgOperand(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be moved above line 2035 to make that section clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved this above the declaration for ArgStructAlloca so this can use the Shareds variable rather than calling the getArgOperand function
| if (ConstantInt *CI = dyn_cast<ConstantInt>(Gep.getOperand(2))) { | ||
| switch (CI->getZExtValue()) { | ||
| case 0: | ||
| TaskLB = &I; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be good to check that the value being indexed is the right one, not just the numeric value of the index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a check to make sure the GEP Instruction being checked is using the Shared's as its first operand.
| Value *Flags = Builder.getInt32(Tied); | ||
|
|
||
| Value *TaskSize = Builder.getInt64( | ||
| divideCeil(M.getDataLayout().getTypeSizeInBits(Taskloop), 8)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| divideCeil(M.getDataLayout().getTypeSizeInBits(Taskloop), 8)); | |
| divideCeil(M.getDataLayout().getTypeSizeInBits(Task), 8)); |
kaviya2510
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the work @Stylie777.
There are some changes that need to be updated for passing the arguments to runtime call __kmpc_taskloop(..). Kindly address those changes.
Rest of the work looks good to me.
| Value *Flags = Builder.getInt32(Tied); | ||
|
|
||
| Value *TaskSize = Builder.getInt64( | ||
| divideCeil(M.getDataLayout().getTypeSizeInBits(Taskloop), 8)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| divideCeil(M.getDataLayout().getTypeSizeInBits(Taskloop), 8)); | |
| divideCeil(M.getDataLayout().getTypeSizeInBits(Task), 8)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we are utilizing structArg to store the the loopbound values, the struct __OMP_STRUCT_TYPE(Taskloop, kmp_task_info, false, VoidPtr, VoidPtr, Int32, VoidPtr, VoidPtr, Int64, Int64, Int64) is no longer needed.
The required size for storing loop bounds can be reserved in kmp_task_t by strutArg itself.
| Builder.CreateStore(Step_ext, step); | ||
| llvm::Value *loadstep = Builder.CreateLoad(Builder.getInt64Ty(), step); | ||
| llvm::Value *Lb = Builder.CreateStructGEP(ArgStructType, TaskShareds, 0); | ||
| Builder.CreateStore(CastedLBVal, Lb); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Builder.CreateStore(CastedLBVal, Lb); | |
| auto *Idx0 = Builder.getInt32(0); | |
| llvm::Value *Lb = Builder.CreateGEP(ArgStructType, TaskShareds, {Idx0, Builder.getInt32(0)}); |
The values of lb,ub and step are already populated in StructArg. You can directly access it and pass the pointer to the runtime call __kmpc_taskloop(...)
| SharedsSize); | ||
| } | ||
| llvm::Value *Ub = Builder.CreateStructGEP(ArgStructType, TaskShareds, 1); | ||
| Builder.CreateStore(CastedUBVal, Ub); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Remove the store instruction.
| Builder.CreateMemCpy(TaskShareds, Alignment, Shareds, Alignment, | ||
| SharedsSize); | ||
| } | ||
| llvm::Value *Ub = Builder.CreateStructGEP(ArgStructType, TaskShareds, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| llvm::Value *Ub = Builder.CreateStructGEP(ArgStructType, TaskShareds, 1); | |
| llvm::Value *Ub =Builder.CreateGEP(ArgStructType, TaskShareds, {Idx0, Builder.getInt32(1)}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GEP to StructArg and get the upper bound value.
| llvm::Value *Ub = Builder.CreateStructGEP(ArgStructType, TaskShareds, 1); | ||
| Builder.CreateStore(CastedUBVal, Ub); | ||
|
|
||
| llvm::Value *Step = Builder.CreateStructGEP(ArgStructType, TaskShareds, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| llvm::Value *Step = Builder.CreateStructGEP(ArgStructType, TaskShareds, 2); | |
| llvm::Value *Step =Builder.CreateGEP(ArgStructType, TaskShareds, {Idx0, Builder.getInt32(2)}); |
| Builder.CreateStore(CastedUBVal, Ub); | ||
|
|
||
| llvm::Value *Step = Builder.CreateStructGEP(ArgStructType, TaskShareds, 2); | ||
| Builder.CreateStore(CastedStepVal, Step); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the store.
| llvm::BasicBlock *Body = CLI->getBody(); | ||
| for (llvm::Instruction &I : *Body) { | ||
| if (auto *Add = llvm::dyn_cast<llvm::BinaryOperator>(&I)) { | ||
| if (Add->getOpcode() == llvm::Instruction::Add) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tom raised a concern that this add instruction pattern might also match other unrelated add instructions, and we discussed this in my PR: llvm#166903 (comment)
He suggested looking at the wsloop and distribute implementations for guidance on how this is handled there. I have not had a chance to dig into that yet. Could you please check this once?
This approach was agreed in the original Taskloop PR here: llvm#166903 (comment). This uses the
structArgto transfer the data between outlined functions to better support the bounds in Taskloop.