Skip to content

Conversation

@simongdavies
Copy link
Contributor

@simongdavies simongdavies commented Oct 28, 2025

This pull request introduces a new crash handler capability for Hyperlight, enabling automatic sandbox dump generation when the host crashes due to fatal signals (Linux) or exceptions (Windows). The system uses a global, lock-free registry for tracking sandboxes and installs platform-specific crash handlers. The implementation is gated behind a feature flag and is robust against initialization failures and recursive crashes.

Crash handler infrastructure:

  • Added a new crash_handler.rs module that manages sandbox registration and crash dump generation, using a global lock-free registry (DashMap) and lazy initialization with once_cell. It ensures safe (for crash context) access to hypervisor pointers and robust initialization.
  • Integrated new dependencies once_cell and dashmap in Cargo.toml to support lock-free global state and lazy initialization for the crash handler subsystem.

Linux-specific crash handling:

  • Added a new crash_handler/linux.rs module that installs signal handlers for fatal signals (e.g., SIGSEGV, SIGABRT) and chains to previous handlers after generating sandbox dumps. It includes robust detection for whether core dumps are enabled on the system and ensures async-signal-safety is intentionally violated only during crash handling.

Safety and error handling:

  • The crash handler code is careful to avoid unsafe behavior except during crash contexts, and includes mechanisms to prevent recursive crash handling, detect initialization failures, and fail gracefully when system configuration disables core dumps.

Closes #966

@simongdavies simongdavies added the kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. label Oct 28, 2025
@simongdavies simongdavies force-pushed the create-vm-crashdumps-on-process-crash branch from 429ee4f to 0666756 Compare October 28, 2025 13:08
@simongdavies simongdavies changed the title DRAFT: Create crashdumps for VMs if process is crashing and creating a dump DRAFT: Create crashdumps for active sandboxes if process is crashing and creating a crashdump Oct 28, 2025
@simongdavies simongdavies force-pushed the create-vm-crashdumps-on-process-crash branch from 0666756 to 1852abf Compare October 28, 2025 13:50
Copy link
Contributor

@dblnz dblnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good and clean from my point of view. Great work!

Some tests would be awesome! To know this works as expected with every change.

let mut sa: sigaction = std::mem::zeroed();
sa.sa_sigaction = crash_signal_handler as usize;
sa.sa_flags = SA_SIGINFO | SA_RESTART;
libc::sigemptyset(&mut sa.sa_mask);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how this works, but is setting the signal handler before retrieving the previous the way it works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes when you set the handler you get back any previous handler which you can then chain to


// Register with crash handler if dumps are enabled
#[cfg(feature = "crashdump")]
if vm.runtime_config().guest_core_dump
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this on by default?
Imagine running a huge number of sandboxes and something hangs, and you want to terminate it and it starts dumping everything.
I know this is not a production use case, but still. Let me know what you think.
Also, I am curious what happens if we're running 100+ sandboxes of big sizes 500MiB 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that we should have a configuration option for this which turns it on or off globally and set it to false by default?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now we can leave it as is and introduce a configuration option if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement For PRs adding features, improving functionality, docs, tests, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crashdump feature should enable guest dump creation if host process dumps

2 participants