Skip to content

Conversation

@jake-arkinstall
Copy link
Contributor

@jake-arkinstall jake-arkinstall commented Oct 10, 2025

This PR provides lowering for qsystem's GPU extension for selene-specific workflows.

This is my first compilation PR, so if it's janky I apologise in advance. There were some battles with getting to grips with inkwell, but I'm very happy with the result.

Background

The qsystem GPU extension provides a general mechanism for declaring and using the API of a gpu library.

For generality, this mechanism involves:

  • the user defining functions with:
    • any (valid) function name,
    • any arrangement of integer, floating point, or boolean arguments,
    • a return type of void, float, or integer.
  • function names having some mapping to an integer tag
  • the ability to acquire and discard a GPU context
  • the ability to invoke the aforementioned functions given a gpu context, an integer tag, and the appropriate parameters
  • the ability to retrieve results after invoking said functions

The intent is that the library appropriately fulfilling the user's specifications is linked with the user program in order to provide the required functionality. In Selene, this would look like adding the library as a utility at the build stage.

As this is the first such utility that represents a generic usecase, breakages could get ugly. As such, I took some care to handle future breaking changes.

API design

This lowering assumes the following function signatures in the library to be linked with the user program:

char const* gpu_get_error();

This provides a string representing an error that should be made available upon any of the following functions failing. It should return nullptr if there is no error to fetch.

Each of the following functions provide bool return values which represent success (true) or failure (false). Each result is validated and, upon false, provides an extended panic message to the user containing the result of gpu_get_error() (or "No error message available" upon a nullptr being returned).

I utilise Selene's panic_str QIS extension to emit panics that are not constrained to 255 bytes, as a custom library may wish to provide far more detail than would fit in 255 bytes. Given this lowering is specific to selene, I don't believe that deviating from standard QIS will be a controversial choice.

bool gpu_validate_api(uint64_t major, uint64_t minor, uint64_t patch);

We do not currently know how the remainder of the API will change over time:

  • It's feasible that array arguments, strings etc could be added at a later time. This will particularly impact the signature API (see the gpu_call description).
  • Returning different kinds of data may be useful in future (and so fetching results may require additional approaches)

As such, this should be the first call to the GPU library, passing an API version (currently 0.1.0, the version I'm assigning to the API I am currently describing). It is invoked before any function that depends on breakable aspects of the API, with caching to avoid multiple calls. It should return true on success, false otherwise. Upon failure, gpu_get_error() is called, and therefore these two functions must be kept compatible in future editions if we opt to make breaking changes in future.

The remainder of functions may be broken in future editions, and as long as gpu_validate_api is managed appropriately on the library side, we should at least be able to fail early (at linking or early at runtime) to prevent undefined behaviour (e.g. by invoking a function with a modified signature).

bool gpu_init(uint64_t _reserved, uint64_t* gpu_ref_out);

The reserved identifier may be used in future if we wish to use multiple instances of gpu libraries, e.g. for maintaining distinct state between them.

bool gpu_discard(uint64_t gpu_ref);

After which, the library should free resources associated with the gpu_ref handle, and further invocations using this handle should error.

bool gpu_get_function_id(char const* name, uint64_t* id_out)

Some implementations may wish to map functions to indices in an array, or use a hash on the incoming names, or something else.

This is an opportune moment to return false and thus fail early if the requested function name is not supported by the linked library.

 bool gpu_call(
     uint64_t handle,
     uint64_t function_id,
     uint64_t blob_size,
     char const* blob,
     char const* signature
)

The handle and function_id parameters should be apparent at this point.

When a function is invoked, it requires parameters. This is where the blob and signature come in:

  • parameters are packed (in bytes) into an array, which is sent to the gpu_call.
    • I chose this over varargs because varargs aren't particularly friendly to work with in a diagnostic setting. They can't be passed to other functions, for example.
  • the size is also passed to the library, which allows for quick validation and safer parsing
  • for completeness, a signature string (generated at compile-time) is also passed as an argument.
    • Types are encoded as i64 => i, u64 => i, f64 => f, bool => b, and the inputs and return type are separated by a colon.
    • The resulting signature string is in the form e.g.
      • iifb:v for (int64, uint64, float, bool) -> void
      • b:f for (bool)->float
    • This is also an opportunity for validation. Perhaps it should be moved to get_function_id to for validation to take place earlier.
bool gpu_get_result(uint64_t gpu_ref, uint64_t out_len, char* out_result)

This extracts a result from a previous call, in a FIFO manner. Currently the only out_len supported by the underlying hugr is 64 bits, as functions are assumed to return double or uint64, but this can be changed at a later date without breaking the API. The lowering provided in this PR handles the casting and alignment.

Controversial choices

  • It might be more appropriate to return an integer as an error code, rather than a boolean flag representing success or failure.
    However, error codes are primarily useful in areas where we have a defined system for handling different forms of error. We don't really have a way of catching those errors in the user code, so we always end up either continuing or terminating. Bool felt satisfactory here.
    In future this could be broken - all we need to do is keep the bool return for gpu_validate_api, which is a clear success or fail case anyway.
  • The use of panic_str may seem odd, but I found it very useful during the implementation of this. A library I am testing out provides stack traces upon failure, and this helped identify the issue with my calls - directly in panic messages on the other side, where a normal panic() would have otherwise truncated it. If it helped me, it will help users.
  • I chose to force the validation of returned function statuses to not be inline. It's a two-line removal if we want it inline. The primary reason I chose to do this is that the error handling needlessly clutters up otherwise-clean LLVM IR. If that isn't a good enough reason I can remove it.

@jake-arkinstall jake-arkinstall requested a review from a team as a code owner October 10, 2025 16:22
@codecov
Copy link

codecov bot commented Oct 10, 2025

Codecov Report

❌ Patch coverage is 81.57895% with 119 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.81%. Comparing base (3a074fd) to head (a0cc737).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
qis-compiler/rust/gpu.rs 81.15% 56 Missing and 61 partials ⚠️
qis-compiler/rust/lib.rs 0.00% 1 Missing ⚠️
qis-compiler/rust/selene_specific.rs 93.75% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1169      +/-   ##
==========================================
+ Coverage   78.44%   78.81%   +0.37%     
==========================================
  Files         151      153       +2     
  Lines       18263    18909     +646     
  Branches    17169    17807     +638     
==========================================
+ Hits        14326    14903     +577     
- Misses       3056     3064       +8     
- Partials      881      942      +61     
Flag Coverage Δ
python 92.65% <ø> (ø)
qis-compiler 68.40% <100.00%> (+0.31%) ⬆️
rust 78.41% <81.34%> (+0.40%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements GPU lowering support for the selene-hugr-qis-compiler by adding a complete LLVM compilation backend for the tket.gpu extension. The implementation provides a bridge between high-level GPU operations and a C-style external library API.

Key changes include:

  • Implementation of GPU operation lowering with external C API integration
  • Addition of Selene-specific panic handling for improved error reporting
  • Comprehensive test coverage with snapshot testing across multiple target platforms

Reviewed Changes

Copilot reviewed 9 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
qis-compiler/rust/selene_specific.rs Adds panic_str function for enhanced error handling in Selene context
qis-compiler/rust/lib.rs Integrates GPU and Selene-specific modules into the main library
qis-compiler/rust/gpu.rs Main implementation of GPU extension lowering with comprehensive API mapping
qis-compiler/python/tests/test_basic_generation.py Adds GPU compilation test with snapshot verification
qis-compiler/python/tests/snapshots/... Generated LLVM IR snapshots for various target architectures
qis-compiler/Cargo.toml Adds strum dependency for enum iteration support

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@jake-arkinstall
Copy link
Contributor Author

I've just added a fix for the function ID lookup.

Previous approach:

  • Check if the constant for the associated function ID exists in the module
  • If it doesn't, then emit the constant as 0 and emit the code required to set it to the proper value via a gpu_get_function_id call.
  • Return it.

Problem:

  • It assumed that if the constant exists, then the code to set it has been run. But this emited code might be in a code block that is executed optionally, thus there are branches in which the default value 0 is assumed

Fix

  • We always emit something when requesting the function ID
  • This 'something' is now a (non-inline) function call.
  • This call checks if the associated global is -1 (all ones) and, if so, performs the gpu_get_function_id_call

@jake-arkinstall jake-arkinstall force-pushed the feat/gpu-lowering-proposal branch from b9d2075 to f4ab4da Compare October 13, 2025 10:58
@jake-arkinstall
Copy link
Contributor Author

Rebased on main as some (unrelated?) semver stuff was failing.

@doug-q
Copy link
Contributor

doug-q commented Oct 14, 2025

Just based on description:

  • gpu_get_error is unfortunately global. Ideally it would take a gpu_ref, or otherwise a "reserved" like gpu_init.
  • gpu_get_functinon_id should take gpu_ref
  • I would prefer gpu_get_result to take a size and a buffer, rather than to hard code 64 bits.

@jake-arkinstall
Copy link
Contributor Author

jake-arkinstall commented Oct 15, 2025

@doug-q Largely agree, though we are bound by the HUGR structure the ClassicalCompute structure.

In this sense:

  • function IDs are accessed separately from the GPU context itself in the hugr, so it doesn't have access to it in order to make that call.

  • the HUGR has space for a 64 bit value

In those cases we can modify the API and bump the API version if the hugr changes, and detect when the dylib API is incompatible at startup. Then those signatures can change arbitrarily.

On the gpu_get_error, it doesn't handle the case where acquiring a context fails (as you have no handle to send). I could instead add a gpu_get_global_error() for global errors and gpu_get_context_error(handle) for segregation of errors that are strateful and errors that are not, though.

@doug-q
Copy link
Contributor

doug-q commented Oct 15, 2025

@doug-q Largely agree, though we are bound by the HUGR structure the ClassicalCompute structure.

In this sense:

  • function IDs are accessed separately from the GPU context itself in the hugr, so it doesn't have access to it in order to make that call.

Good point. I think that in a perfect world we would do this differently.

  • the HUGR has space for a 64 bit value
    I don't understand this. Does this mean that all the hugr ops return something that is at most 64 bits?
    In that case, this doesn't mean the gpu api needs to be restricted to 64 bit types. you could have
bool gpu_get_result(uint64_t gpu_ref, size_t len, char * out_result)

and only call it with len == 8. I would expect you to prefer this as it's more future-proof. What you have now is adequate.

On the gpu_get_error, it doesn't handle the case where acquiring a context fails (as you have no handle to send). I could instead add a gpu_get_global_error() for global errors and gpu_get_context_error(handle) for segregation of errors that are strateful and errors that are not, though.

I think this would be better, but it's not required.

I do think that you should add uint64_t _reserved params to everything that doesn't take a gpu_ref (gpu_get_error, gpu_validate_api, gpu_get_function_id).

Whence to call gpu_validate_api

I recommend you write

impl GpuCodgen {
  fn emit_validate_api(&self, &Builder) -> Result<()> {
    ...
  }
}

And call this in wrap_main, i.e. insert the call to gpu_validate_api into the generate main function that wraps the hugr main function. (you'll need to thread a cloned GpuCodegen around, it is also possible to have emit_validate_api above be a bare function).

Copy link
Contributor

@doug-q doug-q left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have no coverage of any of this. I know it is indeed covered by python tests, but this is unfortunate for project-wide coverage stats.

I recommend tests like emit_futures_codegen, would you add them?

Correct me if I'm wrong, but the packing does not seem to be specified? This seems like an important detail to describe in prose.

let function = ctx
.get_current_module()
.add_function(&function_name, fn_type, None);
let noinline_id =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why noinline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It keeps the resulting code leaner and easier to reason about.

As each call is followed by a validation step and a potential panic, inlining adds a lot of bloat to the resulting code. And as the GPU call is expensive in its own right, the advantage of avoiding a jump to uniform validation code feels like a relative microoptimisation at the cost of more expensive compilation.

I'll file a PR with inlining removed so that the snapshots can be compared.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #1177

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. In general I prefer to leave it to the optimiser to decide but the expense of gpu calls is convincing

@doug-q
Copy link
Contributor

doug-q commented Oct 15, 2025

Also, I don't understand the point of the signature argument. Is it to create error messages?

@jake-arkinstall
Copy link
Contributor Author

Also, I don't understand the point of the signature argument. Is it to create error messages?

All we're sending to the endpoint is a sequence of bytes, representing packed values, and making the communication protocol a narrow contract between the user's definition and the library to link against.

The signature string allows for validation and better error messages for sure, but it also allows one to:

  • create a dummy library that satisfies any interface for diagnostic purposes - able to unpack all input values (for printing), to generating a valid value based on the given signature, etc.
  • create a middleman shim that mutates the packed format to something else. Examples might include a library that turns requests into HTTP requests with e.g. an array of input values in JSON format, or uses libffi to invoke functions directly in another library, or even prompts the user for requested output based on some inputs that it provides (so they can experiment with hypothetical decoders that they figure out by hand).

@jake-arkinstall
Copy link
Contributor Author

You have no coverage of any of this. I know it is indeed covered by python tests, but this is unfortunate for project-wide coverage stats.

I have added some snapshot tests in rust now - from prior art thanks to @croyzor and @qartik. Annoyingly the entire selene-hugr-qis-compiler is devoid of rust tests prior to this - that should change.

Correct me if I'm wrong, but the packing does not seem to be specified? This seems like an important detail to describe in prose.

will do!

@jake-arkinstall jake-arkinstall force-pushed the feat/gpu-lowering-proposal branch 2 times, most recently from 613bafc to 9885f9b Compare October 15, 2025 14:21
@jake-arkinstall
Copy link
Contributor Author

Ok, some cleaning up and feedback incorporation done. Works great against my library playground.

Rust tests have good coverage, and there are a couple of drive-by changes:

  • Data packing is now aligned.
  • API validation is emitted before any 'initial' ops like gpu_get_function_id and gpu_init. This is because I'm not confident in my original approach of hijacking custom_const.

I opted not to go for the _reserved parameters to functions missing a context handle parameter for the time being.

@doug-q
Copy link
Contributor

doug-q commented Oct 16, 2025

Looks good, I need to read in more detail before approving.

Using global variables like this means multi-threading is not allowed, are you ok with that? llvm does have thread local (https://releases.llvm.org/14.0.0/docs/LangRef.html#id1454) but I've not used it.

@jake-arkinstall
Copy link
Contributor Author

Good callout on the thread_local. I've added that.

As a drive-by, I realised that encoding 'the function ID has not been looked up yet' as a sentinel value on the function ID can and will lead to frustration, so I have added an explicit flag for its lookup status instead. Also sprinkled some more comments for a bit of clarity.

@jake-arkinstall jake-arkinstall force-pushed the feat/gpu-lowering-proposal branch 2 times, most recently from c9067f0 to 7a27a10 Compare October 21, 2025 09:15
@jake-arkinstall jake-arkinstall force-pushed the feat/gpu-lowering-proposal branch from 7a27a10 to a0cc737 Compare October 22, 2025 09:24
Copy link
Contributor

@doug-q doug-q left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks Jake.

let function = ctx
.get_current_module()
.add_function(&function_name, fn_type, None);
let noinline_id =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. In general I prefer to leave it to the optimiser to decide but the expense of gpu calls is convincing

@jake-arkinstall jake-arkinstall added this pull request to the merge queue Oct 23, 2025
Merged via the queue into main with commit bcf1d4c Oct 23, 2025
24 checks passed
@jake-arkinstall jake-arkinstall deleted the feat/gpu-lowering-proposal branch October 23, 2025 12:53
github-merge-queue bot pushed a commit that referenced this pull request Nov 10, 2025
🤖 I have created a release *beep* *boop*
---


##
[0.2.10](qis-compiler-v0.2.9...qis-compiler-v0.2.10)
(2025-11-10)


### Features

* add GPU lowering to selene-hugr-qis-compiler
([#1169](#1169))
([bcf1d4c](bcf1d4c))

### Bug Fixes

* Fix runtime panic when iterating through arrays of affine/bool types
([hugr#2666](Quantinuum/hugr#2666))
([01b8a8e](01b8a8e))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: Agustín Borgna <[email protected]>
Co-authored-by: Agustín Borgna <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants