fix: high CPU usage (200-600%) on EVDI/DisplayLink outputs#2109
fix: high CPU usage (200-600%) on EVDI/DisplayLink outputs#2109pcortellezzi wants to merge 1 commit intopop-os:masterfrom
Conversation
Replace llvmpipe (CPU) rendering on EVDI/DisplayLink outputs with hardware GPU rendering using a niri-style swapchain replacement. After initialize_output(), DrmCompositor::set_format() swaps the EVDI swapchain allocator with the primary GPU's GBM using Linear modifier. The surface thread then uses single_renderer with the primary GPU node, keeping everything on one device. The GbmFramebufferExporter detects the buffer as foreign and imports it via dmabuf into EVDI's GBM for DRM framebuffer creation — no CPU-side pixel copy. The render path change is guarded by a shared AtomicBool flag (swapchain_on_primary) that is only set to true when set_format() succeeds. If set_format() fails, the surface thread falls back to the original render path, avoiding a mismatch between the swapchain allocator and the renderer. Key changes: - mod.rs: Extract primary_gbm before device loop, call set_format() on software targets after initialize_output(), set swapchain_on_primary flag on success - surface/mod.rs: Thread is_software and swapchain_on_primary flags, use (primary_node, primary_node) as render/target for software outputs only when swapchain was successfully moved to primary GPU - device.rs: Pass is_software to Surface::new() Note: The compositor is accessed directly through drm.compositors().get(crtc) instead of DrmOutput::with_compositor() to avoid deadlocking on the RwLock that LockedDrmOutputManager already holds as a write guard. Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
The analysis of the AI agent here is wrong/incomplete. The core issue is, that the cursor is composited using llvmpipe, because of the cursor planes of the userspace evdi driver showing corrupted buffer contents on some systems. We are deliberately not using (any) GPU's GBM pipeline to allocate buffers, as this causes performance issues on other setups, while the copy operation doesn't and would happen with the GBM pipeline inside the driver as well. The correct solution is to get rid of this part of the code: https://github.com/pop-os/cosmic-comp/blob/master/src/backend/kms/mod.rs#L887-L897. And then figure out, what causes the corrupted cursor frames and test and verify on various machines with different gpus and drivers in use. |
|
Thanks for the detailed feedback. I agree that disabling cursor planes is a significant contributor to the mouse movement CPU spikes, and fixing cursor plane corruption would help. However, I'm also observing 100-120% sustained CPU usage at idle with just a few terminals and a browser open — no mouse movement at all. Any screen update (cursor blinking, web content, window redraws) goes through llvmpipe, which is inherently expensive on high-resolution displays. So it seems like there are two separate issues:
For reference, I'm currently running this patch on my daily setup (two 3440×1440 EVDI/DisplayLink monitors). In the same conditions (terminals + browser, normal usage), CPU usage dropped from 100-120% at idle (200-600% with mouse movement) down to 10-15%. It's been a game changer for usability. You mentioned that using the primary GPU's GBM pipeline causes performance issues on other setups — could you elaborate on what issues you've seen ? That would help understand whether this approach is viable or if there's a better path. |
It only does, because we need to composite the cursor on top of every new frame inside the If the buffer would be instead allocated via GBM, the system would have to migrate the buffer into system memory, so that the evdi driver can read it. This means a lot of drivers could not directly render into the buffer anymore, but would internally copy from device-memory into system memory. Potentially with less information, because they don't know which regions changed (we already limit the copy path as much as we can). So your proposed change wouldn't eliminate a copy nor any costly llvmpipe render operations once the cursor-plane is fixed. The only reason it performs better at the moment, is that it composites the cursor before copying. |
|
Thank you for the detailed explanation, that really clarifies things. |
Problem
EVDI/DisplayLink outputs use llvmpipe (software OpenGL) for rendering, causing 200-600% CPU usage with simple mouse movements on high-resolution displays (e.g. 3440×1440). This makes EVDI-connected monitors essentially unusable for desktop use.
The root cause is twofold:
MultiRenderer(cross-device path), which copies pixels back viaglReadPixels— another expensive CPU operation on top of software rendering.Solution
Replace the EVDI swapchain allocator with the primary (hardware) GPU's GBM device using
DrmCompositor::set_format()withModifier::Linear, then render usingsingle_rendereron the primary GPU. This is similar to how niri handles display-only devices.After
initialize_output(), for software targets:set_format()swaps the swapchain allocator from EVDI's GBM (llvmpipe) to the primary GPU's GBM with Linear modifier(primary_node, primary_node)as(render_node, effective_target), keeping everything in thesingle_rendererpathGbmFramebufferExporterdetects the buffer as "foreign" and imports it via dmabuf into EVDI's GBM for DRM framebuffer creation — no CPU-side pixel copyThe render path change is guarded by a shared
AtomicBool(swapchain_on_primary) that is only set totruewhenset_format()succeeds. If it fails, the surface thread falls back to the original render path.Important implementation note
The compositor is accessed directly through
drm.compositors().get(crtc)instead ofDrmOutput::with_compositor()becauseLockedDrmOutputManageralready holds a write lock on the compositorRwLock— callingwith_compositor()would deadlock trying to acquire a read lock on it.Test plan
is_software == trueANDset_format()succeeds)set_format()fails (not tested — would require a setup where the DRM test commit rejects Linear buffers from the primary GPU)Checklist