Support for additional vendor GPUs

### Short story

I'd like to contribute support for additional GPUs.  I work for AMD, so my priority is to support interfaces for AMD GPUs through the HIP interface, but I see no reason I shouldn't use this opportunity to build a generic enough interface that a different vendor can contribute an implementation for their hardware.

I'm happy to do this, and I'm working with folks involved in the NWChemEx project that would find it extremely useful.  And it would give you more users.

Is this contribution something you are interested in and would accept?

### Long story

I have been working with some folks adjacent to a scientific code called NWChemEx.  There's a lot of dependencies there, but the short story is that one of their dependencies, LibIntX, uses cuda-api-wrappers for marshalling work off to a CUDA backend for running on NVIDIA GPUs.  They've requested something similar for AMD GPUs and so I thought I would open an issue here for a few reasons:

1. To let you know about it;
2. To ask you if this is a contribution you are happy to receive;
3. To propose a structure a solution to accommodate this request without breaking your existing users;
4. To get feedback on all of the above.

I've been interacting with Andrey Asadchev to get his feeling for what would work for him when he's working in LibIntX.  I think a good way to think about this is simply to take an existing example.  A snippet from the vectorAdd example currently looks like this:

```c++
	auto device = cuda::device::current::get();
	auto d_A = cuda::memory::device::make_unique<float[]>(device, numElements);
	auto d_B = cuda::memory::device::make_unique<float[]>(device, numElements);
	auto d_C = cuda::memory::device::make_unique<float[]>(device, numElements);

	cuda::memory::copy(d_A.get(), h_A.get(), size);
	cuda::memory::copy(d_B.get(), h_B.get(), size);

	auto launch_config = cuda::launch_config_builder()
		.overall_size(numElements)
		.block_size(256)
		.build();

	std::cout
		<< "CUDA kernel launch with " << launch_config.dimensions.grid.x
		<< " blocks of " << launch_config.dimensions.block.x << " threads each\n";

	cuda::launch(
		vectorAdd, launch_config,
		d_A.get(), d_B.get(), d_C.get(), numElements
	);
```

Andrey and I proposed a more generic interface that looks like this:

```c++
	auto device = gpu::device::current::get();
	auto d_A = gpu::memory::device::make_unique<float[]>(device, numElements);
	auto d_B = gpu::memory::device::make_unique<float[]>(device, numElements);
	auto d_C = gpu::memory::device::make_unique<float[]>(device, numElements);

	gpu::memory::copy(d_A.get(), h_A.get(), size);
	gpu::memory::copy(d_B.get(), h_B.get(), size);

	auto launch_config = gpu::launch_config_builder()
		.overall_size(numElements)
		.block_size(256)
		.build();

	std::cout
		<< "GPU kernel launch with " << launch_config.dimensions.grid.x
		<< " blocks of " << launch_config.dimensions.block.x << " threads each\n";

	gpu::launch(
		vectorAdd, launch_config,
		d_A.get(), d_B.get(), d_C.get(), numElements
	);
```

Basically, everything stays the same but the generic interface makes no references to CUDA specifically.  The backend can offload to CUDA, HIP, SYCL, and so on, by simply setting a preprocessor macro to a specific value.  Perhaps something like `#define CAW_BACKEND cuda` for CUDA or `#define CAW_BACKEND hip` for HIP.  This approach would mean that your thin interface layer can stay header-only and anybody using your software could simply set up this macro in their build system and pass it to the compiler with the `-DCAW_BACKEND=...` flag.

As far as file organisation goes, I think it makes sense to put backend-specific code in their own subdirectories.  This will be pretty disruptive.  Perhaps an example of what I'm thinking of is like this:

```
cuda-api-wrappers/
  - src/
    - generic/  # generic api stuff here
    - cuda/  # cuda backend
    - hip/  # hip backend
  - examples/ # all examples become generic gpu:: instead of cuda::
```

This will require some pretty disruptive and invasive changes to the build system, but this structure at least lends itself to putting a `CMakeLists.txt` file in each vendor-specific backend and having the top-level `CMakeLists.txt` simply include whichever vendor-specific one the user requested when they ask for, for example, `cmake -DCAW_BACKEND=cuda ..` or `cmake -DCAW_BACKEND=hip`.

I'd need to work out the details, but hopefully this paints enough of a picture that you can let me know if this is something you're happy with.

I not only welcome comments and criticisms, but I heavily encourage them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support for additional vendor GPUs #541

Short story

Long story

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Support for additional vendor GPUs #541

Description

Short story

Long story

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions