feat: add GPU lowering to selene-hugr-qis-compiler #1169

jake-arkinstall · 2025-10-10T16:22:36Z

This PR provides lowering for qsystem's GPU extension for selene-specific workflows.

This is my first compilation PR, so if it's janky I apologise in advance. There were some battles with getting to grips with inkwell, but I'm very happy with the result.

Background

The qsystem GPU extension provides a general mechanism for declaring and using the API of a gpu library.

For generality, this mechanism involves:

the user defining functions with:
- any (valid) function name,
- any arrangement of integer, floating point, or boolean arguments,
- a return type of void, float, or integer.
function names having some mapping to an integer tag
the ability to acquire and discard a GPU context
the ability to invoke the aforementioned functions given a gpu context, an integer tag, and the appropriate parameters
the ability to retrieve results after invoking said functions

The intent is that the library appropriately fulfilling the user's specifications is linked with the user program in order to provide the required functionality. In Selene, this would look like adding the library as a utility at the build stage.

As this is the first such utility that represents a generic usecase, breakages could get ugly. As such, I took some care to handle future breaking changes.

API design

This lowering assumes the following function signatures in the library to be linked with the user program:

char const* gpu_get_error();

This provides a string representing an error that should be made available upon any of the following functions failing. It should return nullptr if there is no error to fetch.

Each of the following functions provide bool return values which represent success (true) or failure (false). Each result is validated and, upon false, provides an extended panic message to the user containing the result of gpu_get_error() (or "No error message available" upon a nullptr being returned).

I utilise Selene's panic_str QIS extension to emit panics that are not constrained to 255 bytes, as a custom library may wish to provide far more detail than would fit in 255 bytes. Given this lowering is specific to selene, I don't believe that deviating from standard QIS will be a controversial choice.

bool gpu_validate_api(uint64_t major, uint64_t minor, uint64_t patch);

We do not currently know how the remainder of the API will change over time:

It's feasible that array arguments, strings etc could be added at a later time. This will particularly impact the signature API (see the gpu_call description).
Returning different kinds of data may be useful in future (and so fetching results may require additional approaches)

As such, this should be the first call to the GPU library, passing an API version (currently 0.1.0, the version I'm assigning to the API I am currently describing). It is invoked before any function that depends on breakable aspects of the API, with caching to avoid multiple calls. It should return true on success, false otherwise. Upon failure, gpu_get_error() is called, and therefore these two functions must be kept compatible in future editions if we opt to make breaking changes in future.

The remainder of functions may be broken in future editions, and as long as gpu_validate_api is managed appropriately on the library side, we should at least be able to fail early (at linking or early at runtime) to prevent undefined behaviour (e.g. by invoking a function with a modified signature).

bool gpu_init(uint64_t _reserved, uint64_t* gpu_ref_out);

The reserved identifier may be used in future if we wish to use multiple instances of gpu libraries, e.g. for maintaining distinct state between them.

bool gpu_discard(uint64_t gpu_ref);

After which, the library should free resources associated with the gpu_ref handle, and further invocations using this handle should error.

bool gpu_get_function_id(char const* name, uint64_t* id_out)

Some implementations may wish to map functions to indices in an array, or use a hash on the incoming names, or something else.

This is an opportune moment to return false and thus fail early if the requested function name is not supported by the linked library.

 bool gpu_call(
     uint64_t handle,
     uint64_t function_id,
     uint64_t blob_size,
     char const* blob,
     char const* signature
)

The handle and function_id parameters should be apparent at this point.

When a function is invoked, it requires parameters. This is where the blob and signature come in:

parameters are packed (in bytes) into an array, which is sent to the gpu_call.
- I chose this over varargs because varargs aren't particularly friendly to work with in a diagnostic setting. They can't be passed to other functions, for example.
the size is also passed to the library, which allows for quick validation and safer parsing
for completeness, a signature string (generated at compile-time) is also passed as an argument.
- Types are encoded as i64 => i, u64 => i, f64 => f, bool => b, and the inputs and return type are separated by a colon.
- The resulting signature string is in the form e.g.
  - iifb:v for (int64, uint64, float, bool) -> void
  - b:f for (bool)->float
- This is also an opportunity for validation. Perhaps it should be moved to get_function_id to for validation to take place earlier.

bool gpu_get_result(uint64_t gpu_ref, uint64_t out_len, char* out_result)

This extracts a result from a previous call, in a FIFO manner. Currently the only out_len supported by the underlying hugr is 64 bits, as functions are assumed to return double or uint64, but this can be changed at a later date without breaking the API. The lowering provided in this PR handles the casting and alignment.

Controversial choices

It might be more appropriate to return an integer as an error code, rather than a boolean flag representing success or failure.
However, error codes are primarily useful in areas where we have a defined system for handling different forms of error. We don't really have a way of catching those errors in the user code, so we always end up either continuing or terminating. Bool felt satisfactory here.
In future this could be broken - all we need to do is keep the bool return for gpu_validate_api, which is a clear success or fail case anyway.
The use of panic_str may seem odd, but I found it very useful during the implementation of this. A library I am testing out provides stack traces upon failure, and this helped identify the issue with my calls - directly in panic messages on the other side, where a normal panic() would have otherwise truncated it. If it helped me, it will help users.
I chose to force the validation of returned function statuses to not be inline. It's a two-line removal if we want it inline. The primary reason I chose to do this is that the error handling needlessly clutters up otherwise-clean LLVM IR. If that isn't a good enough reason I can remove it.

codecov · 2025-10-10T16:25:56Z

Codecov Report

❌ Patch coverage is 81.57895% with 119 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.81%. Comparing base (3a074fd) to head (a0cc737).
⚠️ Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
qis-compiler/rust/gpu.rs	81.15%	56 Missing and 61 partials ⚠️
qis-compiler/rust/lib.rs	0.00%	1 Missing ⚠️
qis-compiler/rust/selene_specific.rs	93.75%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1169      +/-   ##
==========================================
+ Coverage   78.44%   78.81%   +0.37%     
==========================================
  Files         151      153       +2     
  Lines       18263    18909     +646     
  Branches    17169    17807     +638     
==========================================
+ Hits        14326    14903     +577     
- Misses       3056     3064       +8     
- Partials      881      942      +61

Flag	Coverage Δ
python	`92.65% <ø> (ø)`
qis-compiler	`68.40% <100.00%> (+0.31%)`	⬆️
rust	`78.41% <81.34%> (+0.40%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull Request Overview

This PR implements GPU lowering support for the selene-hugr-qis-compiler by adding a complete LLVM compilation backend for the tket.gpu extension. The implementation provides a bridge between high-level GPU operations and a C-style external library API.

Key changes include:

Implementation of GPU operation lowering with external C API integration
Addition of Selene-specific panic handling for improved error reporting
Comprehensive test coverage with snapshot testing across multiple target platforms

Reviewed Changes

Copilot reviewed 9 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
qis-compiler/rust/selene_specific.rs	Adds panic_str function for enhanced error handling in Selene context
qis-compiler/rust/lib.rs	Integrates GPU and Selene-specific modules into the main library
qis-compiler/rust/gpu.rs	Main implementation of GPU extension lowering with comprehensive API mapping
qis-compiler/python/tests/test_basic_generation.py	Adds GPU compilation test with snapshot verification
qis-compiler/python/tests/snapshots/...	Generated LLVM IR snapshots for various target architectures
qis-compiler/Cargo.toml	Adds strum dependency for enum iteration support

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

qis-compiler/rust/gpu.rs

jake-arkinstall · 2025-10-13T10:08:08Z

I've just added a fix for the function ID lookup.

Previous approach:

Check if the constant for the associated function ID exists in the module
If it doesn't, then emit the constant as 0 and emit the code required to set it to the proper value via a gpu_get_function_id call.
Return it.

Problem:

It assumed that if the constant exists, then the code to set it has been run. But this emited code might be in a code block that is executed optionally, thus there are branches in which the default value 0 is assumed

Fix

We always emit something when requesting the function ID
This 'something' is now a (non-inline) function call.
This call checks if the associated global is -1 (all ones) and, if so, performs the gpu_get_function_id_call

jake-arkinstall · 2025-10-13T11:00:31Z

Rebased on main as some (unrelated?) semver stuff was failing.

doug-q · 2025-10-14T15:37:42Z

Just based on description:

gpu_get_error is unfortunately global. Ideally it would take a gpu_ref, or otherwise a "reserved" like gpu_init.
gpu_get_functinon_id should take gpu_ref
I would prefer gpu_get_result to take a size and a buffer, rather than to hard code 64 bits.

jake-arkinstall · 2025-10-15T00:23:30Z

@doug-q Largely agree, though we are bound by the HUGR structure the ClassicalCompute structure.

In this sense:

function IDs are accessed separately from the GPU context itself in the hugr, so it doesn't have access to it in order to make that call.
the HUGR has space for a 64 bit value

In those cases we can modify the API and bump the API version if the hugr changes, and detect when the dylib API is incompatible at startup. Then those signatures can change arbitrarily.

On the gpu_get_error, it doesn't handle the case where acquiring a context fails (as you have no handle to send). I could instead add a gpu_get_global_error() for global errors and gpu_get_context_error(handle) for segregation of errors that are strateful and errors that are not, though.

doug-q · 2025-10-15T09:43:11Z

@doug-q Largely agree, though we are bound by the HUGR structure the ClassicalCompute structure.

In this sense:

function IDs are accessed separately from the GPU context itself in the hugr, so it doesn't have access to it in order to make that call.

Good point. I think that in a perfect world we would do this differently.

the HUGR has space for a 64 bit value
I don't understand this. Does this mean that all the hugr ops return something that is at most 64 bits?
In that case, this doesn't mean the gpu api needs to be restricted to 64 bit types. you could have

bool gpu_get_result(uint64_t gpu_ref, size_t len, char * out_result)

and only call it with len == 8. I would expect you to prefer this as it's more future-proof. What you have now is adequate.

On the gpu_get_error, it doesn't handle the case where acquiring a context fails (as you have no handle to send). I could instead add a gpu_get_global_error() for global errors and gpu_get_context_error(handle) for segregation of errors that are strateful and errors that are not, though.

I think this would be better, but it's not required.

I do think that you should add uint64_t _reserved params to everything that doesn't take a gpu_ref (gpu_get_error, gpu_validate_api, gpu_get_function_id).

Whence to call gpu_validate_api

I recommend you write

impl GpuCodgen {
  fn emit_validate_api(&self, &Builder) -> Result<()> {
    ...
  }
}

And call this in wrap_main, i.e. insert the call to gpu_validate_api into the generate main function that wraps the hugr main function. (you'll need to thread a cloned GpuCodegen around, it is also possible to have emit_validate_api above be a bare function).

doug-q

You have no coverage of any of this. I know it is indeed covered by python tests, but this is unfortunate for project-wide coverage stats.

I recommend tests like emit_futures_codegen, would you add them?

Correct me if I'm wrong, but the packing does not seem to be specified? This seems like an important detail to describe in prose.

qis-compiler/rust/gpu.rs

doug-q · 2025-10-15T09:51:12Z

qis-compiler/rust/gpu.rs

+                let function = ctx
+                    .get_current_module()
+                    .add_function(&function_name, fn_type, None);
+                let noinline_id =


why noinline?

It keeps the resulting code leaner and easier to reason about.

As each call is followed by a validation step and a potential panic, inlining adds a lot of bloat to the resulting code. And as the GPU call is expensive in its own right, the advantage of avoiding a jump to uniform validation code feels like a relative microoptimisation at the cost of more expensive compilation.

I'll file a PR with inlining removed so that the snapshots can be compared.

Fair enough. In general I prefer to leave it to the optimiser to decide but the expense of gpu calls is convincing

qis-compiler/rust/gpu.rs

doug-q · 2025-10-15T10:00:05Z

Also, I don't understand the point of the signature argument. Is it to create error messages?

jake-arkinstall · 2025-10-15T12:10:32Z

Also, I don't understand the point of the signature argument. Is it to create error messages?

All we're sending to the endpoint is a sequence of bytes, representing packed values, and making the communication protocol a narrow contract between the user's definition and the library to link against.

The signature string allows for validation and better error messages for sure, but it also allows one to:

create a dummy library that satisfies any interface for diagnostic purposes - able to unpack all input values (for printing), to generating a valid value based on the given signature, etc.
create a middleman shim that mutates the packed format to something else. Examples might include a library that turns requests into HTTP requests with e.g. an array of input values in JSON format, or uses libffi to invoke functions directly in another library, or even prompts the user for requested output based on some inputs that it provides (so they can experiment with hypothetical decoders that they figure out by hand).

jake-arkinstall · 2025-10-15T12:54:28Z

You have no coverage of any of this. I know it is indeed covered by python tests, but this is unfortunate for project-wide coverage stats.

I have added some snapshot tests in rust now - from prior art thanks to @croyzor and @qartik. Annoyingly the entire selene-hugr-qis-compiler is devoid of rust tests prior to this - that should change.

Correct me if I'm wrong, but the packing does not seem to be specified? This seems like an important detail to describe in prose.

will do!

jake-arkinstall · 2025-10-15T17:15:50Z

Ok, some cleaning up and feedback incorporation done. Works great against my library playground.

Rust tests have good coverage, and there are a couple of drive-by changes:

Data packing is now aligned.
API validation is emitted before any 'initial' ops like gpu_get_function_id and gpu_init. This is because I'm not confident in my original approach of hijacking custom_const.

I opted not to go for the _reserved parameters to functions missing a context handle parameter for the time being.

doug-q · 2025-10-16T08:39:25Z

Looks good, I need to read in more detail before approving.

Using global variables like this means multi-threading is not allowed, are you ok with that? llvm does have thread local (https://releases.llvm.org/14.0.0/docs/LangRef.html#id1454) but I've not used it.

jake-arkinstall · 2025-10-16T09:55:28Z

Good callout on the thread_local. I've added that.

As a drive-by, I realised that encoding 'the function ID has not been looked up yet' as a sentinel value on the function ID can and will lead to frustration, so I have added an explicit flag for its lookup status instead. Also sprinkled some more comments for a bit of clarity.

…entinel constant function ID and perform lookup on demand.

…ome first, rather than upon emitting the ConstGpuModule, such that it is present in any relevant snapshots

…for this.

…k status in a sentinel value, but in a separate boolean. Add a few more comments.

doug-q

Great, thanks Jake.

doug-q · 2025-10-23T09:07:02Z

qis-compiler/rust/gpu.rs

+                let function = ctx
+                    .get_current_module()
+                    .add_function(&function_name, fn_type, None);
+                let noinline_id =


Fair enough. In general I prefer to leave it to the optimiser to decide but the expense of gpu calls is convincing

🤖 I have created a release *beep* *boop* --- ## [0.2.10](qis-compiler-v0.2.9...qis-compiler-v0.2.10) (2025-11-10) ### Features * add GPU lowering to selene-hugr-qis-compiler ([#1169](#1169)) ([bcf1d4c](bcf1d4c)) ### Bug Fixes * Fix runtime panic when iterating through arrays of affine/bool types ([hugr#2666](Quantinuum/hugr#2666)) ([01b8a8e](01b8a8e)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: Agustín Borgna <[email protected]> Co-authored-by: Agustín Borgna <[email protected]>

jake-arkinstall requested a review from a team as a code owner October 10, 2025 16:22

jake-arkinstall requested a review from doug-q October 10, 2025 16:22

jake-arkinstall requested a review from Copilot October 10, 2025 19:48

Copilot AI reviewed Oct 10, 2025

View reviewed changes

jake-arkinstall force-pushed the feat/gpu-lowering-proposal branch from b9d2075 to f4ab4da Compare October 13, 2025 10:58

doug-q reviewed Oct 15, 2025

View reviewed changes

jake-arkinstall force-pushed the feat/gpu-lowering-proposal branch 2 times, most recently from 613bafc to 9885f9b Compare October 15, 2025 14:21

jake-arkinstall requested a review from doug-q October 15, 2025 17:15

jake-arkinstall force-pushed the feat/gpu-lowering-proposal branch 2 times, most recently from c9067f0 to 7a27a10 Compare October 21, 2025 09:15

jake-arkinstall requested a review from ss2165 October 22, 2025 09:13

jake-arkinstall added 6 commits October 22, 2025 10:17

feat: add GPU lowering to selene-hugr-qis-compiler

84ff268

Satisfy clippy

38b34a1

fix: don't assume causality on function ID lookup. Verify against a s…

1d7580d

…entinel constant function ID and perform lookup on demand.

cargo fmt

70829c8

Prefer build_tag for Option handling

dfcf48d

Add rust snapshot tests

6dc0765

jake-arkinstall added 12 commits October 22, 2025 10:17

Cargo fmt

9296b2c

Document packing approach and pack in a more robust manner

879f13f

Cargo fmt

7b527ed

Update get_result interface to provide an output size and char pointer

5036093

Cargo fmt

f90fcae

Don't configure insta inline - lean on hugr::check_emission

0b48bc0

Appropriately name snapshots

2a3f664

Add lookup by name test. Add validation calls before calls that may c…

8fdf767

…ome first, rather than upon emitting the ConstGpuModule, such that it is present in any relevant snapshots

Cargo fmt. Could really do with having the precommit setup in devenv …

57fa3f9

…for this.

Out of the rstest trenches, now for a python snapshot update

51765c3

Use thread_local storage for constants. Don't encode function ID chec…

306cc6f

…k status in a sentinel value, but in a separate boolean. Add a few more comments.

Fixup after rebase

a0cc737

jake-arkinstall force-pushed the feat/gpu-lowering-proposal branch from 7a27a10 to a0cc737 Compare October 22, 2025 09:24

doug-q approved these changes Oct 23, 2025

View reviewed changes

ss2165 approved these changes Oct 23, 2025

View reviewed changes

jake-arkinstall added this pull request to the merge queue Oct 23, 2025

Merged via the queue into main with commit bcf1d4c Oct 23, 2025
24 checks passed

jake-arkinstall deleted the feat/gpu-lowering-proposal branch October 23, 2025 12:53

hugrbot mentioned this pull request Oct 23, 2025

chore(py): release qis-compiler 0.2.10 #1199

Merged

feat: add GPU lowering to selene-hugr-qis-compiler #1169

feat: add GPU lowering to selene-hugr-qis-compiler #1169

Uh oh!

Conversation

jake-arkinstall commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

API design

Controversial choices

Uh oh!

codecov bot commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jake-arkinstall commented Oct 13, 2025

Previous approach:

Problem:

Fix

Uh oh!

jake-arkinstall commented Oct 13, 2025

Uh oh!

doug-q commented Oct 14, 2025

Uh oh!

jake-arkinstall commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

doug-q commented Oct 15, 2025

Uh oh!

doug-q left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

doug-q Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

jake-arkinstall Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

jake-arkinstall Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

doug-q Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

doug-q commented Oct 15, 2025

Uh oh!

jake-arkinstall commented Oct 15, 2025

Uh oh!

jake-arkinstall commented Oct 15, 2025

Uh oh!

jake-arkinstall commented Oct 15, 2025

Uh oh!

doug-q commented Oct 16, 2025

Uh oh!

jake-arkinstall commented Oct 16, 2025

Uh oh!

doug-q left a comment

Choose a reason for hiding this comment

Uh oh!

doug-q Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

jake-arkinstall commented Oct 10, 2025 •

edited

Loading

codecov bot commented Oct 10, 2025 •

edited

Loading

jake-arkinstall commented Oct 15, 2025 •

edited

Loading