-
Notifications
You must be signed in to change notification settings - Fork 10
feat: add GPU lowering to selene-hugr-qis-compiler #1169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1169 +/- ##
==========================================
+ Coverage 78.44% 78.81% +0.37%
==========================================
Files 151 153 +2
Lines 18263 18909 +646
Branches 17169 17807 +638
==========================================
+ Hits 14326 14903 +577
- Misses 3056 3064 +8
- Partials 881 942 +61
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements GPU lowering support for the selene-hugr-qis-compiler by adding a complete LLVM compilation backend for the tket.gpu extension. The implementation provides a bridge between high-level GPU operations and a C-style external library API.
Key changes include:
- Implementation of GPU operation lowering with external C API integration
- Addition of Selene-specific panic handling for improved error reporting
- Comprehensive test coverage with snapshot testing across multiple target platforms
Reviewed Changes
Copilot reviewed 9 out of 11 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| qis-compiler/rust/selene_specific.rs | Adds panic_str function for enhanced error handling in Selene context |
| qis-compiler/rust/lib.rs | Integrates GPU and Selene-specific modules into the main library |
| qis-compiler/rust/gpu.rs | Main implementation of GPU extension lowering with comprehensive API mapping |
| qis-compiler/python/tests/test_basic_generation.py | Adds GPU compilation test with snapshot verification |
| qis-compiler/python/tests/snapshots/... | Generated LLVM IR snapshots for various target architectures |
| qis-compiler/Cargo.toml | Adds strum dependency for enum iteration support |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
I've just added a fix for the function ID lookup. Previous approach:
Problem:
Fix
|
b9d2075 to
f4ab4da
Compare
|
Rebased on main as some (unrelated?) semver stuff was failing. |
|
Just based on description:
|
|
@doug-q Largely agree, though we are bound by the HUGR structure the ClassicalCompute structure. In this sense:
In those cases we can modify the API and bump the API version if the hugr changes, and detect when the dylib API is incompatible at startup. Then those signatures can change arbitrarily. On the gpu_get_error, it doesn't handle the case where acquiring a context fails (as you have no handle to send). I could instead add a gpu_get_global_error() for global errors and gpu_get_context_error(handle) for segregation of errors that are strateful and errors that are not, though. |
Good point. I think that in a perfect world we would do this differently.
and only call it with len == 8. I would expect you to prefer this as it's more future-proof. What you have now is adequate.
I think this would be better, but it's not required. I do think that you should add Whence to call gpu_validate_api I recommend you write And call this in |
doug-q
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have no coverage of any of this. I know it is indeed covered by python tests, but this is unfortunate for project-wide coverage stats.
I recommend tests like emit_futures_codegen, would you add them?
Correct me if I'm wrong, but the packing does not seem to be specified? This seems like an important detail to describe in prose.
| let function = ctx | ||
| .get_current_module() | ||
| .add_function(&function_name, fn_type, None); | ||
| let noinline_id = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why noinline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It keeps the resulting code leaner and easier to reason about.
As each call is followed by a validation step and a potential panic, inlining adds a lot of bloat to the resulting code. And as the GPU call is expensive in its own right, the advantage of avoiding a jump to uniform validation code feels like a relative microoptimisation at the cost of more expensive compilation.
I'll file a PR with inlining removed so that the snapshots can be compared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #1177
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. In general I prefer to leave it to the optimiser to decide but the expense of gpu calls is convincing
|
Also, I don't understand the point of the signature argument. Is it to create error messages? |
All we're sending to the endpoint is a sequence of bytes, representing packed values, and making the communication protocol a narrow contract between the user's definition and the library to link against. The signature string allows for validation and better error messages for sure, but it also allows one to:
|
I have added some snapshot tests in rust now - from prior art thanks to @croyzor and @qartik. Annoyingly the entire selene-hugr-qis-compiler is devoid of rust tests prior to this - that should change.
will do! |
613bafc to
9885f9b
Compare
|
Ok, some cleaning up and feedback incorporation done. Works great against my library playground. Rust tests have good coverage, and there are a couple of drive-by changes:
I opted not to go for the _reserved parameters to functions missing a context handle parameter for the time being. |
|
Looks good, I need to read in more detail before approving. Using global variables like this means multi-threading is not allowed, are you ok with that? llvm does have thread local (https://releases.llvm.org/14.0.0/docs/LangRef.html#id1454) but I've not used it. |
|
Good callout on the thread_local. I've added that. As a drive-by, I realised that encoding 'the function ID has not been looked up yet' as a sentinel value on the function ID can and will lead to frustration, so I have added an explicit flag for its lookup status instead. Also sprinkled some more comments for a bit of clarity. |
c9067f0 to
7a27a10
Compare
…entinel constant function ID and perform lookup on demand.
…ome first, rather than upon emitting the ConstGpuModule, such that it is present in any relevant snapshots
…k status in a sentinel value, but in a separate boolean. Add a few more comments.
7a27a10 to
a0cc737
Compare
doug-q
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks Jake.
| let function = ctx | ||
| .get_current_module() | ||
| .add_function(&function_name, fn_type, None); | ||
| let noinline_id = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. In general I prefer to leave it to the optimiser to decide but the expense of gpu calls is convincing
🤖 I have created a release *beep* *boop* --- ## [0.2.10](qis-compiler-v0.2.9...qis-compiler-v0.2.10) (2025-11-10) ### Features * add GPU lowering to selene-hugr-qis-compiler ([#1169](#1169)) ([bcf1d4c](bcf1d4c)) ### Bug Fixes * Fix runtime panic when iterating through arrays of affine/bool types ([hugr#2666](Quantinuum/hugr#2666)) ([01b8a8e](01b8a8e)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: Agustín Borgna <[email protected]> Co-authored-by: Agustín Borgna <[email protected]>
This PR provides lowering for qsystem's GPU extension for selene-specific workflows.
This is my first compilation PR, so if it's janky I apologise in advance. There were some battles with getting to grips with inkwell, but I'm very happy with the result.
Background
The qsystem GPU extension provides a general mechanism for declaring and using the API of a gpu library.
For generality, this mechanism involves:
The intent is that the library appropriately fulfilling the user's specifications is linked with the user program in order to provide the required functionality. In Selene, this would look like adding the library as a
utilityat the build stage.As this is the first such utility that represents a generic usecase, breakages could get ugly. As such, I took some care to handle future breaking changes.
API design
This lowering assumes the following function signatures in the library to be linked with the user program:
This provides a string representing an error that should be made available upon any of the following functions failing. It should return nullptr if there is no error to fetch.
Each of the following functions provide bool return values which represent success (true) or failure (false). Each result is validated and, upon false, provides an extended panic message to the user containing the result of
gpu_get_error()(or "No error message available" upon a nullptr being returned).I utilise Selene's
panic_strQIS extension to emit panics that are not constrained to 255 bytes, as a custom library may wish to provide far more detail than would fit in 255 bytes. Given this lowering is specific to selene, I don't believe that deviating from standard QIS will be a controversial choice.We do not currently know how the remainder of the API will change over time:
gpu_calldescription).As such, this should be the first call to the GPU library, passing an API version (currently 0.1.0, the version I'm assigning to the API I am currently describing). It is invoked before any function that depends on breakable aspects of the API, with caching to avoid multiple calls. It should return true on success, false otherwise. Upon failure, gpu_get_error() is called, and therefore these two functions must be kept compatible in future editions if we opt to make breaking changes in future.
The remainder of functions may be broken in future editions, and as long as gpu_validate_api is managed appropriately on the library side, we should at least be able to fail early (at linking or early at runtime) to prevent undefined behaviour (e.g. by invoking a function with a modified signature).
The
reservedidentifier may be used in future if we wish to use multiple instances of gpu libraries, e.g. for maintaining distinct state between them.After which, the library should free resources associated with the
gpu_refhandle, and further invocations using this handle should error.Some implementations may wish to map functions to indices in an array, or use a hash on the incoming names, or something else.
This is an opportune moment to return false and thus fail early if the requested function name is not supported by the linked library.
The handle and function_id parameters should be apparent at this point.
When a function is invoked, it requires parameters. This is where the blob and signature come in:
i64 => i, u64 => i, f64 => f, bool => b, and the inputs and return type are separated by a colon.iifb:vfor(int64, uint64, float, bool) -> voidb:ffor(bool)->floatget_function_idto for validation to take place earlier.This extracts a result from a previous call, in a FIFO manner. Currently the only
out_lensupported by the underlying hugr is 64 bits, as functions are assumed to return double or uint64, but this can be changed at a later date without breaking the API. The lowering provided in this PR handles the casting and alignment.Controversial choices
However, error codes are primarily useful in areas where we have a defined system for handling different forms of error. We don't really have a way of catching those errors in the user code, so we always end up either continuing or terminating. Bool felt satisfactory here.
In future this could be broken - all we need to do is keep the bool return for
gpu_validate_api, which is a clear success or fail case anyway.panic_strmay seem odd, but I found it very useful during the implementation of this. A library I am testing out provides stack traces upon failure, and this helped identify the issue with my calls - directly in panic messages on the other side, where a normalpanic()would have otherwise truncated it. If it helped me, it will help users.