Skip to content

refactor: migrate notification mechanism from signal to eventfd and signalfd#78

Merged
agicy merged 1 commit intomainfrom
feat/eventfd-signalfd
Mar 4, 2026
Merged

refactor: migrate notification mechanism from signal to eventfd and signalfd#78
agicy merged 1 commit intomainfrom
feat/eventfd-signalfd

Conversation

@agicy
Copy link
Contributor

@agicy agicy commented Mar 1, 2026

This PR refactors the virtio daemon's notification mechanism from the legacy sigwait to a modern architecture using eventfd and signalfd. This migration provides significant performance gains and eliminates complex kernel-version-dependent code.

Architectural Evolution & Code Maintainability

  • Simplified API: The legacy signal-based notification required complex conditional compilation to handle the breaking change in the Linux kernel where struct siginfo was replaced by struct kernel_siginfo in send_sig_info (around Kernel 4.20/5.0).
  • Stable Foundation: eventfd has provided a stable and consistent API since Linux 2.6.22, removing the need for version-specific hacks and reducing maintenance overhead.

Performance Benchmarks

The following tables compare the Legacy Architecture (sigwait) and the New Architecture (epoll) using fio. The file tested (/dev/vdb/) is the virtio-blk device implemented by ramdisk. The test commands are listed below.

# iops
fio --name=iops_test --filename=/dev/vdb --direct=1 --rw=randread --bs=4k --ioengine=io_uring --iodepth=64 --runtime=30 --ramp_time=30 --time_based --group_reporting

# bw_test
fio --name=bw_test --filename=/dev/vdb --direct=1 --rw=read --bs=1M --ioengine=io_uring --iodepth=16 --runtime=30 --ramp_time=30 --time_based --group_reporting

# lat_test
fio --name=lat_test --filename=/dev/vdb --direct=1 --rw=read --bs=4k --ioengine=io_uring --iodepth=1 --runtime=30 --ramp_time=30 --time_based --group_reporting

Table 1: Without Busy-Polling (Pure Event-Driven)

Metric Legacy (sigwait) New (epoll) Improvement
RandRead IOPS (4k) 1,364 24,500 +1696%
Throughput (1M) 204 MiB/s 916 MiB/s +349%
Avg Latency (4k, qd=1) 1,343 us 598 us -55%

Table 2: With Busy-Polling

Metric Legacy (sigwait + Poll) New (epoll + Poll) Improvement
RandRead IOPS (4k) 58,100 61,600 +6%
Throughput (1M) 1,231 MiB/s 1,401 MiB/s +13%
Avg Latency (4k, qd=1) 105 us 104 us -1%

Conclusion

The eventfd & signalfd architecture is more efficient than signals in pure event-driven scenarios and offers a much cleaner and more portable codebase across different Linux kernel versions.

References

Depends On: #77

Closes: #64

@caodg caodg requested review from ForeverYolo, li041 and liulog March 1, 2026 10:19
@agicy agicy force-pushed the refactor/build-system-cross-compile branch from 6bd7ae5 to d2c4449 Compare March 1, 2026 10:39
@agicy agicy force-pushed the feat/eventfd-signalfd branch from c9f9a61 to a3adfaa Compare March 1, 2026 10:44
@li041 li041 requested a review from Copilot March 2, 2026 01:49
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the virtio daemon ↔ kernel notification path away from legacy signal delivery (sigwait/send_sig_info) to a file-descriptor-driven approach using eventfd (VirtIO kicks) and signalfd + epoll (termination + kicks), aligning with a more scalable event loop design.

Changes:

  • Add a new ioctl (HVISOR_SET_EVENTFD) to pass an eventfd from userspace to the kernel driver.
  • Kernel IRQ handler now notifies userspace via eventfd_signal() instead of send_sig_info().
  • Userspace virtio daemon creates an eventfd and waits on eventfd + signalfd with epoll.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
tools/virtio/virtio.c Creates/installs eventfd, switches request loop from sigwait to epoll over eventfd + signalfd.
include/hvisor.h Adds HVISOR_SET_EVENTFD ioctl definition.
driver/hvisor.c Stores an eventfd_ctx from userspace and signals it from the virtio IRQ handler.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@liulog
Copy link
Contributor

liulog commented Mar 2, 2026

Does the deadlock you mentioned still exist after this change?

@agicy
Copy link
Contributor Author

agicy commented Mar 2, 2026

Does the deadlock you mentioned still exist after this change?

This PR does not resolve the deadlock issue. The fix for the deadlock is located in the fix/lost-wakeup branch, which requires synchronized changes in both hvisor-tool and hvisor. The modifications for hvisor-tool are already complete and currently undergoing testing. This branch focuses only on the refactor of the notification mechanism.

While this PR improves the ceiling of our notification performance, the fix/lost-wakeup branch is still required to ensure system reliability and correctness under all conditions. In short: this PR makes the system faster, and the upcoming fix will make it more stable.

@li041
Copy link
Contributor

li041 commented Mar 2, 2026

Thank you very much for this PR! These changes provide a significant performance boost and the interface improvements make the codebase much cleaner.

I have a few suggestions regarding the implementation of the event loop:

  • Centralized Event Handling: Since the hvisor main thread now utilizes epoll for monitoring, the dedicated epoll_loop might be redundant. To fully leverage the benefits of I/O multiplexing, it would be more efficient to integrate these tasks into the main thread's epoll instance.
  • Code Reuse: There are existing epoll interfaces within the repository. It would be great if we could reuse those standard interfaces to maintain consistency across the project(e.g. add_event, hvisor_event)

Overall, great work on the optimization. Looking forward to your thoughts on consolidating the polling logic!

@agicy agicy changed the base branch from refactor/build-system-cross-compile to main March 4, 2026 02:53
@liulog
Copy link
Contributor

liulog commented Mar 4, 2026

As far as I know, the current trigger frequency of sigwait is not high because the code polls after waking it up, hence the busy-polling you mentioned.

Therefore, the performance improvement of switching from sigwait to epoll could theoretically limited, especially under high load (even the previous signal method doesn't sleep).

However, embracing new technologies, like epoll, is commendable! 👍

This commit introduces eventfd-based interrupt handling mechanism to replace
the previous signal-based approach, providing more efficient and reliable
kernel-to-user communication.

Key changes:
- Added eventfd creation and configuration in virtio initialization
- Implemented epoll-based event monitoring for both signalfd and eventfd
- Replaced signal-based interrupt handling with eventfd notifications
- Enhanced termination signal handling using signalfd for graceful shutdown

The new implementation eliminates race conditions in signal handling and
provides better performance through efficient event-driven architecture. This
establishes a solid foundation for future VirtIO backend optimizations.
@agicy agicy force-pushed the feat/eventfd-signalfd branch from a3adfaa to 2681f1b Compare March 4, 2026 09:43
@agicy agicy marked this pull request as ready for review March 4, 2026 09:47
Copy link
Contributor

@li041 li041 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! LGTM

@agicy agicy merged commit cc41d60 into main Mar 4, 2026
1 check passed
@agicy agicy deleted the feat/eventfd-signalfd branch March 7, 2026 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace signal-based Virtio event notification with eventfd

4 participants