Skip to content

Conversation

@PlasmaPower
Copy link
Contributor

@PlasmaPower PlasmaPower commented Mar 26, 2025

This PR adds a more efficient alternative CPU based copy if the formats between the buffers are the same (i.e. there's two NVIDIA cards). In addition, this PR lays the groundwork for other uses of Vulkan especially in blit. Vulkan is a lot more powerful than OpenGL here, and any future blit optimizations will likely also use it.

There are two reasons this copy is more efficient than the intermediateBuf codepath:

  1. There's no format conversion. OpenGL needs to convert the pixel data into GL_RGBA whereas Vulkan can keep the pixel data in the existing format and simply copy the memory. However, the Vulkan code could be extended to support a format conversion in the future.
  2. The copy is asynchronous. The previous OpenGL copy was entirely blocking, relying on glReadPixels and glTexImage2d. This new copy uses Vulkan semaphores and fences to consume and produce a sync_file, and even has a background thread for the memcpy itself to make it asynchronous (don't worry, I made sure it's isolated and doesn't cause concurrency concerns for the rest of the system).

Right now I think this is only useful if you have two NVIDIA cards, but @gulafaran has been looking at using Vulkan to optimize the blit for the NVIDIA<>Intel case and this PR lays important groundwork for that as well.

If you want to test this PR on a single-GPU system you can use this patch to force enable Vulkan based blitting: https://gist.github.com/PlasmaPower/1499e2db2abbb1ab88aa1334c145f3fa

@PlasmaPower
Copy link
Contributor Author

I was just going to comment on the Vulkan dep, thanks @fufexan! I think we can make it a build dep only and use entirely dynamic library loading at runtime to fail gracefully on a lack of Vulkan, but at this point I think pretty much everyone has Vulkan installed on their system.

@vaxerski
Copy link
Member

oh god vulkan

@outfoxxed has multi amd thats worth testing too

@PlasmaPower
Copy link
Contributor Author

Seems like v0.8.0 might have broken multi-amd according to #162. I've asked for more details there.

It'd be good to test on this PR too, though I don't think this PR will affect it. It only takes effect when the OpenGL non-CPU codepath fails, and on multi-amd that codepath should work.

@outfoxxed
Copy link
Member

oh god vulkan

@outfoxxed has multi amd thats worth testing too

Lmk what I need to test and I can do that.

@PlasmaPower
Copy link
Contributor Author

Could you first test if the latest aquamarine release (v0.8.0) is still working with displays connected to each of your AMD GPUs, and then test if this PR breaks it if it is working?

@outfoxxed
Copy link
Member

Both 0.8.0 and this branch work as expected. Anything beyond functioning at surface level that needs to be tested?

@PlasmaPower
Copy link
Contributor Author

Nope, just that the displays show up and aren't super slow or anything.

@LuckShiba
Copy link

Any updates on this? I have an Intel + NVIDIA setup and the constant CPU high usage on the external display (that I think that is caused by the CPU copy) is very annoying, this may help but I can't really test right now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants