Skip to content

Support for additional vendor GPUs #541

@dmcdougall

Description

@dmcdougall

Short story

I'd like to contribute support for additional GPUs. I work for AMD, so my priority is to support interfaces for AMD GPUs through the HIP interface, but I see no reason I shouldn't use this opportunity to build a generic enough interface that a different vendor can contribute an implementation for their hardware.

I'm happy to do this, and I'm working with folks involved in the NWChemEx project that would find it extremely useful. And it would give you more users.

Is this contribution something you are interested in and would accept?

Long story

I have been working with some folks adjacent to a scientific code called NWChemEx. There's a lot of dependencies there, but the short story is that one of their dependencies, LibIntX, uses cuda-api-wrappers for marshalling work off to a CUDA backend for running on NVIDIA GPUs. They've requested something similar for AMD GPUs and so I thought I would open an issue here for a few reasons:

  1. To let you know about it;
  2. To ask you if this is a contribution you are happy to receive;
  3. To propose a structure a solution to accommodate this request without breaking your existing users;
  4. To get feedback on all of the above.

I've been interacting with Andrey Asadchev to get his feeling for what would work for him when he's working in LibIntX. I think a good way to think about this is simply to take an existing example. A snippet from the vectorAdd example currently looks like this:

	auto device = cuda::device::current::get();
	auto d_A = cuda::memory::device::make_unique<float[]>(device, numElements);
	auto d_B = cuda::memory::device::make_unique<float[]>(device, numElements);
	auto d_C = cuda::memory::device::make_unique<float[]>(device, numElements);

	cuda::memory::copy(d_A.get(), h_A.get(), size);
	cuda::memory::copy(d_B.get(), h_B.get(), size);

	auto launch_config = cuda::launch_config_builder()
		.overall_size(numElements)
		.block_size(256)
		.build();

	std::cout
		<< "CUDA kernel launch with " << launch_config.dimensions.grid.x
		<< " blocks of " << launch_config.dimensions.block.x << " threads each\n";

	cuda::launch(
		vectorAdd, launch_config,
		d_A.get(), d_B.get(), d_C.get(), numElements
	);

Andrey and I proposed a more generic interface that looks like this:

	auto device = gpu::device::current::get();
	auto d_A = gpu::memory::device::make_unique<float[]>(device, numElements);
	auto d_B = gpu::memory::device::make_unique<float[]>(device, numElements);
	auto d_C = gpu::memory::device::make_unique<float[]>(device, numElements);

	gpu::memory::copy(d_A.get(), h_A.get(), size);
	gpu::memory::copy(d_B.get(), h_B.get(), size);

	auto launch_config = gpu::launch_config_builder()
		.overall_size(numElements)
		.block_size(256)
		.build();

	std::cout
		<< "GPU kernel launch with " << launch_config.dimensions.grid.x
		<< " blocks of " << launch_config.dimensions.block.x << " threads each\n";

	gpu::launch(
		vectorAdd, launch_config,
		d_A.get(), d_B.get(), d_C.get(), numElements
	);

Basically, everything stays the same but the generic interface makes no references to CUDA specifically. The backend can offload to CUDA, HIP, SYCL, and so on, by simply setting a preprocessor macro to a specific value. Perhaps something like #define CAW_BACKEND cuda for CUDA or #define CAW_BACKEND hip for HIP. This approach would mean that your thin interface layer can stay header-only and anybody using your software could simply set up this macro in their build system and pass it to the compiler with the -DCAW_BACKEND=... flag.

As far as file organisation goes, I think it makes sense to put backend-specific code in their own subdirectories. This will be pretty disruptive. Perhaps an example of what I'm thinking of is like this:

cuda-api-wrappers/
  - src/
    - generic/  # generic api stuff here
    - cuda/  # cuda backend
    - hip/  # hip backend
  - examples/ # all examples become generic gpu:: instead of cuda::

This will require some pretty disruptive and invasive changes to the build system, but this structure at least lends itself to putting a CMakeLists.txt file in each vendor-specific backend and having the top-level CMakeLists.txt simply include whichever vendor-specific one the user requested when they ask for, for example, cmake -DCAW_BACKEND=cuda .. or cmake -DCAW_BACKEND=hip.

I'd need to work out the details, but hopefully this paints enough of a picture that you can let me know if this is something you're happy with.

I not only welcome comments and criticisms, but I heavily encourage them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions