You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since Wasmtime's inception it's used the `setjmp` and `longjmp`
functions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:
* Using `setjmp` fundamentally requires using C because Rust does not
understand a function that returns twice. It's fundamentally unsound
to invoke `setjmp` in Rust meaning that Wasmtime has forever needed a
C compiler configured and set up to build. This notably means that
`cargo check` cannot check other targets easily.
* Using `longjmp` means that Rust function frames are unwound on the
stack without running destructors. This is a dangerous operation of
which we get no protection from the compiler about. Both frames
entering wasm and frames exiting wasm are all skipped. Absolutely
minimizing this has been beneficial for portability to platforms such
as Pulley.
* Currently the no_std implementation of Wasmtime requires embedders to
provide `wasmtime_{setjmp,longjmp}` which is a thorn in the side of
what is otherwise a mostly entirely independent implementation of
Wasmtime.
* There is a performance floor to using `setjmp` and `longjmp`. Calling
`setjmp` requires using C but Wasmtime is otherwise written in Rust
meaning that there's a Rust->C->Rust->Wasm boundary which
fundamentally can't be inlined without cross-language LTO which is
difficult to configure.
* With the implementation of the WebAssembly exceptions proposal
Wasmtime now has two means of unwinding the stack. Ideally Wasmtime
would only have one, and the more general one is the method of
exceptions.
* Jumping out of a signal handler on Unix is tricky business. While
we've made it work it's generally most robust of the signal handler
simply returns which it now does.
With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.
One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in `VMStoreContext` so looking up
the current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a `Handler` are stored inline.
Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as bytecodealliance#11570, bytecodealliance#11585,
and bytecodealliance#11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.
Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
`ucontext_t` with register values that the handler specifies. Windows
involves updating similar contexts, and macOS mach ports ended up not
needing too many changes.
In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:
sync/no-hook/core - host-to-wasm - typed - nop
time: [10.552 ns 10.561 ns 10.571 ns]
change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05)
Performance has improved.
Closesbytecodealliance#3927
cc bytecodealliance#10923
prtest:full
0 commit comments