Skip to content

Conversation

@RiverDave
Copy link
Owner

@RiverDave RiverDave commented Jul 20, 2025

This is by far the most challenging intrinsic I've tackled so far.I can truly say that the learning process was amazing.

Couple of things

JamesL425 and others added 30 commits April 9, 2025 14:58
This implements the missing feature `cir::setTargetAttributes`.

Although other targets might also need attributes, this PR focuses on
the CUDA-specific ones. For CUDA kernels (on device side, not stubs),
they must have a calling convention of `ptx_kernel`. It is added here.

CUDA kernels, as well as global variables, also involves lots of NVVM
metadata, which is intended to be dealt with at the same place. It's
marked with a new missing feature here.
This PR implements \_\_constant\_\_ variables.

llvm#1438 only implements \_\_device\_\_ and \_\_shared\_\_ variables, 

~~This PR depends on llvm#1445~~
This is part 2 of CUDA lowering. Still more to come!

This PR generates `__cuda_register_globals` for functions only, without
touching variables.

It also fixes two discrepancies mentioned in Part 1, namely:
- Now CIR will not generate registration code if there's nothing to
register;
- `__cuda_fatbin_wrapper` now becomes a constant.
This PR deals with several issues currently present in CUDA CodeGen.
Each of them requires only a few lines to fix, so they're combined in a
single PR.

**Bug 1.**

Suppose we write
```cpp
__global__ void kernel(int a, int b);
```

Then when we call this kernel with `cudaLaunchKernel`, the 4th argument
to that function is something of the form `void *kernel_args[2] = {&a,
&b}`. OG allocates the space of it with `alloca ptr, i32 2`, but that
doesn't seem to be feasible in CIR, so we allocated `alloca [2 x ptr],
i32 1`. This means there must be an extra GEP as compared to OG.

In CIR, it means we must add an `array_to_ptrdecay` cast before trying
to accessing the array elements. I missed that out in llvm#1332 .

**Bug 2.**

We missed a load instruction for 6th argument to `cudaLaunchKernel`.
It's added back in this PR.

**Bug 3.** 

When we launch a kernel, we first retrieve the return value of
`__cudaPopCallConfiguration`. If it's zero, then the call succeeds and
we should proceed to call the device stub. In llvm#1348 we did exactly the
opposite, calling the device stub only if it's not zero. It's fixed
here.

**Issue 4.**

CallConvLowering is required to make `cudaLaunchKernel` correct. The
codepath is unblocked by adding a `getIndirectResult` at the same place
as OG does -- the function is already implemented so we can just call
it.


After this (and other pending PRs), CIR is now able to compile real CUDA
programs. There are still missing features, which will be followed up
later.
This is Part 3 of registration function generation.

This generates `__cuda_module_dtor`. It cannot be placed in global dtors
list, as treating it as a normal destructor will result in double-free
in recent CUDA versions (see comments in OG). Rather, the function is
passed as callback of `atexit`, which is called at the end of
`__cuda_module_ctor`.
Traditional clang implementation:
https://github.com/llvm/clangir/blob/a1ab6bf6cd3b83d0982c16f29e8c98958f69c024/clang/lib/CodeGen/CGBuiltin.cpp#L3618-L3632

The problem here is that `__builtin_clz` allows undefined result, while
`__lzcnt` doesn't. As a result, I have to create a new CIR for
`__lzcnt`. Since the return type of those two builtin differs, I decided
to change return type of current `CIR_BitOp` to allow new `CIR_LzcntOp`
to inherit from it.

I would like to hear your suggestions. C.c. @Lancern
This PR adds support for compiling builtin variables like `threadIdx`
down to the appropriate intrinsic.

---------

Co-authored-by: Aidan Wong <[email protected]>
Co-authored-by: anominos <[email protected]>
I have now fixed the test. Earlier I made some commits with other
changes because we were testing something on my fork. This should be
resolved now
CIR is currently ignoring the `signext` and `zeroext` for function
arguments and return types produced by CallConvLowering.

This PR lowers them to LLVM IR.
I realized I committed a new file with CRLF before. Really sorry about
that >_<

Related: llvm#1404
The choice of adding a separate file imitates that of OG.
This PR removes a useless argument `convertToInt` and removes hardcoded
`Sint32Type`.

I realized I committed a new file with CRLF before. Really sorry about
that >_<
There are some subtleties here.

This is the code in OG:
```cpp
// note: this is different from default ABI
if (!RetTy->isScalarType())
  return ABIArgInfo::getDirect();
```
which says we should return structs directly. It's correct, has have the
same behaviour as `nvcc`, and it obeys the PTX ABI as well.
The comment dates back to 2013 (see [this
commit](llvm/llvm-project@f9329ff)
-- it didn't provide any explanation either), so I believe it's
outdated. I didn't include this comment in the PR.
…lvm#1486)

The pattern `call {{.*}} i32` mismatches `call i32` due to double spaces
surrounding `{{.*}}`. This patch removes the first space to fix the
failure.
…1487)

This PR resolves an assertion failure in
`CIRGenTypes::isFuncParamTypeConvertible`, which is involved when trying
to emit a vtable entry to a virtual function whose type includes a
pointer-to-member-function.
…lvm#1431)

Implements `::verify` for operations cir.atomic.xchg and
cir.atomic.cmp_xchg

I believe the existing regression tests don't get to the CIR level type
check failure and I was not able to implement a case that does.

Most attempts of reproducing cir.atomic.xchg type check failure were
along the lines of:
```
int a;
long long b,c;
__atomic_exchange(&a, &b, &c, memory_order_seq_cst);
```

And they seem to never trigger the failure on `::verify` because they
fail earlier in function parameter checking:
```
exmp.cpp:7:27: error: cannot initialize a parameter of type 'int *' with an rvalue of type 'long long *'
    7 |     __atomic_exchange(&a, &b, &c, memory_order_seq_cst);
      |                           ^~
```

Closes llvm#1378 .
This PR adds a new boolean flag to the `cir.load` and the `cir.store`
operation that distinguishes nontemporal loads and stores. Besides, this
PR also adds support for the `__builtin_nontemporal_load` and the
`__builtin_nontemporal_store` intrinsic function.
This PR adds a new boolean flag to the `cir.load` and the `cir.store`
operation that distinguishes nontemporal loads and stores. Besides, this
PR also adds support for the `__builtin_nontemporal_load` and the
`__builtin_nontemporal_store` intrinsic function.
This PR adds an insertion guard for the try body scope for try-catch.
Currently, the following code snippet fails during CodeGen:

```
void foo() {
  int r = 1;
  try {
    ++r;
    return;
  } catch (...) {
  }
}
```

The insertion point doesn't get reset properly and the cleanup is being
ran for a wrong/deleted block causing a segmentation fault. I also added
a test.
bcardosolopes and others added 22 commits July 2, 2025 16:14
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions


</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…lvm#1722)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions


</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Couple of things I have questions about:

1. I duplicated function `getIntValueFromConstOp` from
`CIRGenBuiltinAArch64.cpp`. I was wondering if that's correct or if
there's a place where we can avoid that duplication.

2. For the tests related to `mm_prefetch` im not sure if it'd be correct
to define them in a file eg: `sse-builtins.c` like it's currently done
in the codegen lib.

3. I'm also aware we can emit a call for a `PreFetchOp` would that be
required in this case?

related: llvm#1414,
llvm#1404 (A PR was previously opened
but It was not resolved)
…re (llvm#1732)

just a few improvements to mirror og test cases in x86 for better
reference.
…tribute (llvm#1733)

- Remove redundant custom printer and parser for AddressSpace, relying instead on MLIR's default EnumAttr handling.
- Leverage AddressSpace::Default to omit the attribute from the assembly form when not needed. Therefore, an empty attribute is no longer needed to represent the default address space.
- Update PointerType to use the AddressSpace enum directly, instead of a boxed attribute.
Not too much to add. Added a method to call a masked store intrinsic
from the builder. Haven't touched that class to much, so let me know if
that's the right call. Also: Unfortunately there were a lot of test
cases for the intrinsics in this PR, hope that's not a big hustle for
review 😊
Backport ArraySubscript for ComplexType
Backport functional cast to ComplexType
This patch bumps the windows CI container to windows server 2022 from
windows server 2019. This is necessary as Github has sunsetted support
for sever 2019, so we cannot build the container through GHA without
updating. Using more recent versions is just good practice anyways.

This will not roll out immediately and we'll have to make some TF
changes to get deployed, but some additional validation first will be
good anyways.

Reviewers: lnihlen, tstellar, cmtice

Reviewed By: cmtice

Pull Request: llvm/llvm-project#148318

(cherry picked from commit 3e43915)
…style (llvm#1741)

- This adds common `CIR_` prefix to all operation disambiguating them when used with other dialects.

- Unifies traits style in operation definitions
- This fixes default value to be expected 65535
- Introduces DefaultGlobalCtorDtorPriority constant
- Makes function to use I32Attr for priority instead of unnecessary attribute with reference to function
Seems like this is the wrong approach.
This reverts commit bc91ef4.
This updates the lowering of CIR function aliases in such a way that
they now actually become aliases in the final LLVM IR.
…lvm#1740)

This PR has two parts:
1. Mimicking the OG [special
case](https://github.com/llvm/clangir/blob/d030c9bff74f4f9504a61abe9b2c04a8777028a5/clang/lib/CodeGen/CGException.cpp#L690)
for a single catch-all when getting dispatch blocks. The huge testcase I
added, gotten by using
[creduce](https://github.com/csmith-project/creduce) on a c++ file,
crashed at this point [in our
version](https://github.com/llvm/clangir/blob/d030c9bff74f4f9504a61abe9b2c04a8777028a5/clang/lib/CIR/CodeGen/CIRGenException.cpp#L789).
2. Fixing multiple destructor calls for the same object. For example,
there were tests like
[#1](https://github.com/llvm/clangir/blob/d030c9bff74f4f9504a61abe9b2c04a8777028a5/clang/test/CIR/CodeGen/try-catch-dtors.cpp#L370C1-L372C80)
and
[#2](https://github.com/llvm/clangir/blob/d030c9bff74f4f9504a61abe9b2c04a8777028a5/clang/test/CIR/CodeGen/conditional-cleanup.cpp#L217C1-L224C25),
having a second destructor call to an already destroyed object. This PR
fixes these and I have updated the tests. Also, I added `"CIR-NEXT"` at
some points, to confirm the destructors are indeed called once.

As usual, please let me know if you have any concerns.
This patch backports changes made to the bit operations in the upstream
PR llvm/llvm-project#148378. Namely, this patch includes the following
changes:

- This patch removes the `bit.` prefix in the op mnemonic. The operation
names now directly correspond to the builtin function names except for
bswap which is represented by `cir.byte_swap` for more clarity.
- Since all bit operations are `SameOperandsAndResultType`, this patch
updates their assembly format and avoids spelling out the operand type
twice.
Copy link
Owner Author

RiverDave commented Jul 20, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@RiverDave RiverDave marked this pull request as ready for review July 20, 2025 17:36
@RiverDave RiverDave changed the title [CIR][CirGen][Builtin][X86] Lower cvt*2mask intrinsics [CIR][CIRGen][Builtin][X86] Lower cvt*2mask intrinsics Jul 22, 2025
@RiverDave RiverDave force-pushed the fix-vec-cmp-zero branch 2 times, most recently from e040553 to 06b0af0 Compare August 14, 2025 02:10
@RiverDave RiverDave closed this Sep 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.