Skip to content

Conversation

@Stylie777
Copy link
Owner

This approach was agreed in the original Taskloop PR here: llvm#166903 (comment). This uses the structArg to transfer the data between outlined functions to better support the bounds in Taskloop.

- Force the first 3 entries to the StructArg to be the bounds info
- Ensure it will work when executing the tasks in parallel
Copy link

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me beyond some minor nits. I see O3 can optimize away the extra geps and stores from outlining so I think it is okay to leave them.

PostOutlineCBTy PostOutlineCB;
BasicBlock *EntryBB, *ExitBB, *OuterAllocaBB;
SmallVector<Value *, 2> ExcludeArgsFromAggregate;
SetVector<Value *> Inputs, Outputs;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I know the other members don't have documentation but I think it would be good to add something because I don't think the use of these is immediately obvious.
  2. I don't think we can use Outputs. IIRC there's an assertion somewhere that there are no live out values. Better to remove it.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Added a comment about the use of Inputs
  2. We still need some definition of Outputs somewhere as the CodeExtractor's extractCodeRegion expects there to be a SetVector for both Inputs and Outputs. The API gives 2 options, one where you just pass the CEAC value, and another that includes the inputs and outputs. I am happy to exclude the Outputs from the OutlineInfo struct, but there will need to be a SetVector made before extracting the code region from OpenMPIRBuilder::finalize.

llvm::SmallVectorImpl<Instruction *> &ToBeDeleted,
OpenMPIRBuilder::InsertPointTy InnerAllocaIP,
const Twine &Name = "", bool AsPtr = true) {
const Twine &Name = "", bool AsPtr = true, bool Is64Bit = false) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const Twine &Name = "", bool AsPtr = true, bool Is64Bit = false) {
const Twine &Name = "", bool AsPtr = true, IntegerType *IntTy) {
Builder.restoreIP(OuterAllocaIP);
IntTy = IntTy ? IntTy : Builder.getInt32Ty();

More flexible.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

} else {
UseFakeVal =
cast<BinaryOperator>(Builder.CreateAdd(FakeVal, Builder.getInt32(10)));
cast<BinaryOperator>(Builder.CreateAdd(FakeVal, Is64Bit ? Builder.getInt64(10) : Builder.getInt32(10)));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do this from IntTy with llvm::ConstantInt::get or IRBuilderBase::getIntN

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with ConstantInt::get

OI.ExcludeArgsFromAggregate.push_back(createFakeIntVal(
Builder, AllocaIP, ToBeDeleted, TaskloopAllocaIP, "global.tid", false));
Value *FakeLB = createFakeIntVal(Builder, AllocaIP, ToBeDeleted, TaskloopAllocaIP,
"lb", false, true);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"lb", false, true);
"lb", /*AsPtr=*/false, Builder.getInt64Ty());

It isn't obvious what the bool values mean without some extra help

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"ub", false, true);
Value *FakeStep = createFakeIntVal(Builder, AllocaIP, ToBeDeleted, TaskloopAllocaIP,
"step", false, true);
/* For Taskloop, we want to force the bounds being the first 3 inputs in the aggregate struct*/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: llvm style is to use C++ style comments

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// HasShareds is true if any variables are captured in the outlined region,
// false otherwise.
bool HasShareds = StaleCI->arg_size() > 1;
/* Create the casting for the Bounds Values that can be used when outlining to replace the uses of the fakes with real values */
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment style

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/*sizeof_task=*/TaskSize, /*sizeof_shared=*/SharedsSize,
/*task_func=*/&OutlinedFn});

Value *Shareds = StaleCI->getArgOperand(1);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be moved above line 2035 to make that section clearer

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved this above the declaration for ArgStructAlloca so this can use the Shareds variable rather than calling the getArgOperand function

if (ConstantInt *CI = dyn_cast<ConstantInt>(Gep.getOperand(2))) {
switch (CI->getZExtValue()) {
case 0:
TaskLB = &I;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be good to check that the value being indexed is the right one, not just the numeric value of the index.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check to make sure the GEP Instruction being checked is using the Shared's as its first operand.

Value *Flags = Builder.getInt32(Tied);

Value *TaskSize = Builder.getInt64(
divideCeil(M.getDataLayout().getTypeSizeInBits(Taskloop), 8));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
divideCeil(M.getDataLayout().getTypeSizeInBits(Taskloop), 8));
divideCeil(M.getDataLayout().getTypeSizeInBits(Task), 8));

Copy link

@kaviya2510 kaviya2510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the work @Stylie777.

There are some changes that need to be updated for passing the arguments to runtime call __kmpc_taskloop(..). Kindly address those changes.

Rest of the work looks good to me.

Value *Flags = Builder.getInt32(Tied);

Value *TaskSize = Builder.getInt64(
divideCeil(M.getDataLayout().getTypeSizeInBits(Taskloop), 8));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
divideCeil(M.getDataLayout().getTypeSizeInBits(Taskloop), 8));
divideCeil(M.getDataLayout().getTypeSizeInBits(Task), 8));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are utilizing structArg to store the the loopbound values, the struct __OMP_STRUCT_TYPE(Taskloop, kmp_task_info, false, VoidPtr, VoidPtr, Int32, VoidPtr, VoidPtr, Int64, Int64, Int64) is no longer needed.

The required size for storing loop bounds can be reserved in kmp_task_t by strutArg itself.

Builder.CreateStore(Step_ext, step);
llvm::Value *loadstep = Builder.CreateLoad(Builder.getInt64Ty(), step);
llvm::Value *Lb = Builder.CreateStructGEP(ArgStructType, TaskShareds, 0);
Builder.CreateStore(CastedLBVal, Lb);
Copy link

@kaviya2510 kaviya2510 Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Builder.CreateStore(CastedLBVal, Lb);
auto *Idx0 = Builder.getInt32(0);
llvm::Value *Lb = Builder.CreateGEP(ArgStructType, TaskShareds, {Idx0, Builder.getInt32(0)});

The values of lb,ub and step are already populated in StructArg. You can directly access it and pass the pointer to the runtime call __kmpc_taskloop(...)

SharedsSize);
}
llvm::Value *Ub = Builder.CreateStructGEP(ArgStructType, TaskShareds, 1);
Builder.CreateStore(CastedUBVal, Ub);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Remove the store instruction.

Builder.CreateMemCpy(TaskShareds, Alignment, Shareds, Alignment,
SharedsSize);
}
llvm::Value *Ub = Builder.CreateStructGEP(ArgStructType, TaskShareds, 1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
llvm::Value *Ub = Builder.CreateStructGEP(ArgStructType, TaskShareds, 1);
llvm::Value *Ub =Builder.CreateGEP(ArgStructType, TaskShareds, {Idx0, Builder.getInt32(1)});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GEP to StructArg and get the upper bound value.

llvm::Value *Ub = Builder.CreateStructGEP(ArgStructType, TaskShareds, 1);
Builder.CreateStore(CastedUBVal, Ub);

llvm::Value *Step = Builder.CreateStructGEP(ArgStructType, TaskShareds, 2);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
llvm::Value *Step = Builder.CreateStructGEP(ArgStructType, TaskShareds, 2);
llvm::Value *Step =Builder.CreateGEP(ArgStructType, TaskShareds, {Idx0, Builder.getInt32(2)});

Builder.CreateStore(CastedUBVal, Ub);

llvm::Value *Step = Builder.CreateStructGEP(ArgStructType, TaskShareds, 2);
Builder.CreateStore(CastedStepVal, Step);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the store.

llvm::BasicBlock *Body = CLI->getBody();
for (llvm::Instruction &I : *Body) {
if (auto *Add = llvm::dyn_cast<llvm::BinaryOperator>(&I)) {
if (Add->getOpcode() == llvm::Instruction::Add) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tom raised a concern that this add instruction pattern might also match other unrelated add instructions, and we discussed this in my PR: llvm#166903 (comment)

He suggested looking at the wsloop and distribute implementations for guidance on how this is handled there. I have not had a chance to dig into that yet. Could you please check this once?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants