-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Replace setjmp/longjmp usage in Wasmtime #11592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace setjmp/longjmp usage in Wasmtime #11592
Conversation
17516c3 to
8dc6f5b
Compare
This commit fixes a mistake with our inline assembly for resumption of an exception on various platforms. This was detected during the development of bytecodealliance#11592 for riscv64 but I believe this affects other platforms too. The basic issue is that our inline assembly blocks are all clobbering the frame pointer because that's what wasm uses but we have no constraint against preventing any input to these inline assembly blocks from being allocated into the frame pointer. This means that if the destination to jump to is allocated to the frame pointer register then we'll jump to wasm's old frame pointer, no the actual destination, because the frame pointer register is clobbered before jumping. An example of this for riscv64 is on [godbolt] where the `s0` register, the frame pointer on riscv64, is clobbered and then jumped to. The fix in this PR is to manually allocate all registers. All input operands are allocated to explicit registers rather than letting the compiler pick which register they're in. This ensures no overlap with the frame pointer and fixes the test in question. Note that s390x isn't updated here as it doesn't have a frame pointer. [godbolt]: https://godbolt.org/z/E9vWb9coq
This commit fixes a mistake with our inline assembly for resumption of an exception on various platforms. This was detected during the development of bytecodealliance#11592 for riscv64 but I believe this affects other platforms too. The basic issue is that our inline assembly blocks are all clobbering the frame pointer because that's what wasm uses but we have no constraint against preventing any input to these inline assembly blocks from being allocated into the frame pointer. This means that if the destination to jump to is allocated to the frame pointer register then we'll jump to wasm's old frame pointer, no the actual destination, because the frame pointer register is clobbered before jumping. An example of this for riscv64 is on [godbolt] where the `s0` register, the frame pointer on riscv64, is clobbered and then jumped to. The fix in this PR is to manually allocate all registers. All input operands are allocated to explicit registers rather than letting the compiler pick which register they're in. This ensures no overlap with the frame pointer and fixes the test in question. Note that s390x isn't updated here as it doesn't have a frame pointer. [godbolt]: https://godbolt.org/z/E9vWb9coq
2b3bcb6 to
734488d
Compare
This commit fixes a mistake with our inline assembly for resumption of an exception on various platforms. This was detected during the development of #11592 for riscv64 but I believe this affects other platforms too. The basic issue is that our inline assembly blocks are all clobbering the frame pointer because that's what wasm uses but we have no constraint against preventing any input to these inline assembly blocks from being allocated into the frame pointer. This means that if the destination to jump to is allocated to the frame pointer register then we'll jump to wasm's old frame pointer, no the actual destination, because the frame pointer register is clobbered before jumping. An example of this for riscv64 is on [godbolt] where the `s0` register, the frame pointer on riscv64, is clobbered and then jumped to. The fix in this PR is to manually allocate all registers. All input operands are allocated to explicit registers rather than letting the compiler pick which register they're in. This ensures no overlap with the frame pointer and fixes the test in question. Note that s390x isn't updated here as it doesn't have a frame pointer. [godbolt]: https://godbolt.org/z/E9vWb9coq
8e5f360 to
ff016d8
Compare
| pub fn supports_exceptions(&self) -> bool { | ||
| match self { | ||
| CallConv::Tail | CallConv::SystemV => true, | ||
| CallConv::Tail | CallConv::SystemV | CallConv::Winch => true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For review this is particularly noteworthy, I had to enable this because when we compile for Winch now the entry trampolines (system ABI) are calling a Winch-defined function (winch ABI). The try_call being done requires that the callee ABI (here: Winch) needs to support exceptions.
I don't fully understand the implications of this flag, but I updated the aarch64/x64 backends to explicitly say that exceptions + winch clobbers all registers (like tail + winch). I'm not sure if there's more that needs be done, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only used in the verifier, so I think this is fine practically. Semantically, we more or less have defined exception throws in the Winch ABI by virtue of adding the clobber and payload definitions -- so we now "support" it, even though the Winch-the-compiler doesn't yet have lowerings for any of the opcodes.
| && record.ExceptionCode != EXCEPTION_INT_OVERFLOW | ||
| { | ||
| return ExceptionContinueSearch; | ||
| return EXCEPTION_CONTINUE_SEARCH; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For review this is me from 3 years ago causing pain for myself. Years ago we switched from the winapi crate to the windows-sys crate. During that transition I found that the windows-sys crate, at the time, did not have EXCEPTION_CONTINUE_{SEARCH,EXECUTION} defined. It did, however, have ExceptionContinue{Search,Execution}. Being the naive little flower I am I assumed that this was some sort of mistake in the bindings and the values were all the same. Turns out this has always been wrong but it hasn't mattered since we practically never used ExceptionContinueExecution (only used for embedding-handled signals, which folks do sometimes on Linux but basically never on Windows).
Turns out that ExceptionContinue{Search,Execution} have different values than EXCEPTION_CONTINUE_{SEARCH,EXECUTION} and they're both interpreted as "continue search". It's basically a bit of a miracle this never broke before now. In this PR though we're actually using EXCEPTION_CONTINUE_EXECUTION which required this change.
| br#" | ||
| [optimize] | ||
| opt-level = 2 | ||
| regalloc-algorithm = "single-pass" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another consequence of this PR, the single-pass register allocator effectively no longer works with Wasmtime because it has issues with exceptions and Wasmtime unconditionally uses try_call for trampolines. This manifested in tests as crashes so I've disabled the usage in various tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of trampolines getting significantly larger than before. It might worthwhile to invest some effort in making a new set of trampolines where we have something like all static entrypoints have their normal trampoline signature they have today, but internally they dispatch to a signature-specific trampoline which takes the function-to-call as a function pointer. That way we'd effectively deduplicate trampolines at least per-signature as we used to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is to double-check that we don't accidentally ever run exception handlers on traps now that we're almost implementing traps with exceptions.
Label Messager: wasmtime:configIt looks like you are changing Wasmtime's configuration options. Make sure to
To modify this label's message, edit the To add new label messages or remove existing label messages, edit the |
7bb4996 to
bec1299
Compare
|
Hmm, yeah, if we don't adopt the Basically rather than converting the reloc to a label we would apply the reloc directly into the slice of function body. Given that we know the offset of the start of the function and the offset of the exception handler at that point we should have everything we need I think. That said, I think it's probably best I finish other backends on my branch and put up a PR for the first-class operator instead -- it's a more straightforward design (even if a slightly heavy lift to add the new instruction format). Will do that shortly! |
This is designed to enable applications such as bytecodealliance#11592 that use alternative unwinding mechanisms that may not necessarily want to walk a stack and look up exception tables. The idea is that whenever it would be valid to resume to an exception handler that is active on the stack, we can provide the same PC as a first-class runtime value that would be found in the exception table for the given handler edge. A "custom" resume step can then use this PC as a resume-point as long as it follows the relevant exception ABI (i.e.: restore SP, FP, any other saved registers that the exception ABI specifies, and provide appropriate payload value(s)). Handlers are associated with edges out of `try_call`s (or `try_call_indirect`s); and edges specifically, not blocks, because there could be multiple out-edges to one block. The instruction thus takes the block that contains the try-call and an immediate that indexes its exceptional edges. This CLIF instruction required a bit of infrastructure to (i) allow naming raw blocks, not just block calls, as instruction arguments, and (ii) allow getting the MachLabel for any other lowered block during lowering. But given that, the lowerings themselves are straightforward uses of MachBuffer labels to fix-up PC-relative address-loading instructions (e.g., `LEA` or `ADR` or `AUIPC`+`ADDI`).
This is designed to enable applications such as bytecodealliance#11592 that use alternative unwinding mechanisms that may not necessarily want to walk a stack and look up exception tables. The idea is that whenever it would be valid to resume to an exception handler that is active on the stack, we can provide the same PC as a first-class runtime value that would be found in the exception table for the given handler edge. A "custom" resume step can then use this PC as a resume-point as long as it follows the relevant exception ABI (i.e.: restore SP, FP, any other saved registers that the exception ABI specifies, and provide appropriate payload value(s)). Handlers are associated with edges out of `try_call`s (or `try_call_indirect`s); and edges specifically, not blocks, because there could be multiple out-edges to one block. The instruction thus takes the block that contains the try-call and an immediate that indexes its exceptional edges. This CLIF instruction required a bit of infrastructure to (i) allow naming raw blocks, not just block calls, as instruction arguments, and (ii) allow getting the MachLabel for any other lowered block during lowering. But given that, the lowerings themselves are straightforward uses of MachBuffer labels to fix-up PC-relative address-loading instructions (e.g., `LEA` or `ADR` or `AUIPC`+`ADDI`).
This is designed to enable applications such as bytecodealliance#11592 that use alternative unwinding mechanisms that may not necessarily want to walk a stack and look up exception tables. The idea is that whenever it would be valid to resume to an exception handler that is active on the stack, we can provide the same PC as a first-class runtime value that would be found in the exception table for the given handler edge. A "custom" resume step can then use this PC as a resume-point as long as it follows the relevant exception ABI (i.e.: restore SP, FP, any other saved registers that the exception ABI specifies, and provide appropriate payload value(s)). Handlers are associated with edges out of `try_call`s (or `try_call_indirect`s); and edges specifically, not blocks, because there could be multiple out-edges to one block. The instruction thus takes the block that contains the try-call and an immediate that indexes its exceptional edges. This CLIF instruction required a bit of infrastructure to (i) allow naming raw blocks, not just block calls, as instruction arguments, and (ii) allow getting the MachLabel for any other lowered block during lowering. But given that, the lowerings themselves are straightforward uses of MachBuffer labels to fix-up PC-relative address-loading instructions (e.g., `LEA` or `ADR` or `AUIPC`+`ADDI`).
This is designed to enable applications such as bytecodealliance#11592 that use alternative unwinding mechanisms that may not necessarily want to walk a stack and look up exception tables. The idea is that whenever it would be valid to resume to an exception handler that is active on the stack, we can provide the same PC as a first-class runtime value that would be found in the exception table for the given handler edge. A "custom" resume step can then use this PC as a resume-point as long as it follows the relevant exception ABI (i.e.: restore SP, FP, any other saved registers that the exception ABI specifies, and provide appropriate payload value(s)). Handlers are associated with edges out of `try_call`s (or `try_call_indirect`s); and edges specifically, not blocks, because there could be multiple out-edges to one block. The instruction thus takes the block that contains the try-call and an immediate that indexes its exceptional edges. This CLIF instruction required a bit of infrastructure to (i) allow naming raw blocks, not just block calls, as instruction arguments, and (ii) allow getting the MachLabel for any other lowered block during lowering. But given that, the lowerings themselves are straightforward uses of MachBuffer labels to fix-up PC-relative address-loading instructions (e.g., `LEA` or `ADR` or `AUIPC`+`ADDI`).
* Cranelift: add get_exception_handler_address. This is designed to enable applications such as #11592 that use alternative unwinding mechanisms that may not necessarily want to walk a stack and look up exception tables. The idea is that whenever it would be valid to resume to an exception handler that is active on the stack, we can provide the same PC as a first-class runtime value that would be found in the exception table for the given handler edge. A "custom" resume step can then use this PC as a resume-point as long as it follows the relevant exception ABI (i.e.: restore SP, FP, any other saved registers that the exception ABI specifies, and provide appropriate payload value(s)). Handlers are associated with edges out of `try_call`s (or `try_call_indirect`s); and edges specifically, not blocks, because there could be multiple out-edges to one block. The instruction thus takes the block that contains the try-call and an immediate that indexes its exceptional edges. This CLIF instruction required a bit of infrastructure to (i) allow naming raw blocks, not just block calls, as instruction arguments, and (ii) allow getting the MachLabel for any other lowered block during lowering. But given that, the lowerings themselves are straightforward uses of MachBuffer labels to fix-up PC-relative address-loading instructions (e.g., `LEA` or `ADR` or `AUIPC`+`ADDI`). * Review feedback. * Review feedback: more tests.
Since Wasmtime's inception it's used the `setjmp` and `longjmp`
functions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:
* Using `setjmp` fundamentally requires using C because Rust does not
understand a function that returns twice. It's fundamentally unsound
to invoke `setjmp` in Rust meaning that Wasmtime has forever needed a
C compiler configured and set up to build. This notably means that
`cargo check` cannot check other targets easily.
* Using `longjmp` means that Rust function frames are unwound on the
stack without running destructors. This is a dangerous operation of
which we get no protection from the compiler about. Both frames
entering wasm and frames exiting wasm are all skipped. Absolutely
minimizing this has been beneficial for portability to platforms such
as Pulley.
* Currently the no_std implementation of Wasmtime requires embedders to
provide `wasmtime_{setjmp,longjmp}` which is a thorn in the side of
what is otherwise a mostly entirely independent implementation of
Wasmtime.
* There is a performance floor to using `setjmp` and `longjmp`. Calling
`setjmp` requires using C but Wasmtime is otherwise written in Rust
meaning that there's a Rust->C->Rust->Wasm boundary which
fundamentally can't be inlined without cross-language LTO which is
difficult to configure.
* With the implementation of the WebAssembly exceptions proposal
Wasmtime now has two means of unwinding the stack. Ideally Wasmtime
would only have one, and the more general one is the method of
exceptions.
* Jumping out of a signal handler on Unix is tricky business. While
we've made it work it's generally most robust of the signal handler
simply returns which it now does.
With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.
One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in `VMStoreContext` so looking up
the current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a `Handler` are stored inline.
Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as bytecodealliance#11570, bytecodealliance#11585,
and bytecodealliance#11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.
Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
`ucontext_t` with register values that the handler specifies. Windows
involves updating similar contexts, and macOS mach ports ended up not
needing too many changes.
In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:
sync/no-hook/core - host-to-wasm - typed - nop
time: [10.552 ns 10.561 ns 10.571 ns]
change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05)
Performance has improved.
Closes bytecodealliance#3927
cc bytecodealliance#10923
prtest:full
1b52df6 to
7c47541
Compare
|
Updated now to incorporate the work from #11629 |
cfallin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great -- thanks for the patience!
This commit fixes a mistake with our inline assembly for resumption of an exception on various platforms. This was detected during the development of bytecodealliance#11592 for riscv64 but I believe this affects other platforms too. The basic issue is that our inline assembly blocks are all clobbering the frame pointer because that's what wasm uses but we have no constraint against preventing any input to these inline assembly blocks from being allocated into the frame pointer. This means that if the destination to jump to is allocated to the frame pointer register then we'll jump to wasm's old frame pointer, no the actual destination, because the frame pointer register is clobbered before jumping. An example of this for riscv64 is on [godbolt] where the `s0` register, the frame pointer on riscv64, is clobbered and then jumped to. The fix in this PR is to manually allocate all registers. All input operands are allocated to explicit registers rather than letting the compiler pick which register they're in. This ensures no overlap with the frame pointer and fixes the test in question. Note that s390x isn't updated here as it doesn't have a frame pointer. [godbolt]: https://godbolt.org/z/E9vWb9coq
* Cranelift: add get_exception_handler_address. This is designed to enable applications such as bytecodealliance#11592 that use alternative unwinding mechanisms that may not necessarily want to walk a stack and look up exception tables. The idea is that whenever it would be valid to resume to an exception handler that is active on the stack, we can provide the same PC as a first-class runtime value that would be found in the exception table for the given handler edge. A "custom" resume step can then use this PC as a resume-point as long as it follows the relevant exception ABI (i.e.: restore SP, FP, any other saved registers that the exception ABI specifies, and provide appropriate payload value(s)). Handlers are associated with edges out of `try_call`s (or `try_call_indirect`s); and edges specifically, not blocks, because there could be multiple out-edges to one block. The instruction thus takes the block that contains the try-call and an immediate that indexes its exceptional edges. This CLIF instruction required a bit of infrastructure to (i) allow naming raw blocks, not just block calls, as instruction arguments, and (ii) allow getting the MachLabel for any other lowered block during lowering. But given that, the lowerings themselves are straightforward uses of MachBuffer labels to fix-up PC-relative address-loading instructions (e.g., `LEA` or `ADR` or `AUIPC`+`ADDI`). * Review feedback. * Review feedback: more tests.
Since Wasmtime's inception it's used the `setjmp` and `longjmp`
functions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:
* Using `setjmp` fundamentally requires using C because Rust does not
understand a function that returns twice. It's fundamentally unsound
to invoke `setjmp` in Rust meaning that Wasmtime has forever needed a
C compiler configured and set up to build. This notably means that
`cargo check` cannot check other targets easily.
* Using `longjmp` means that Rust function frames are unwound on the
stack without running destructors. This is a dangerous operation of
which we get no protection from the compiler about. Both frames
entering wasm and frames exiting wasm are all skipped. Absolutely
minimizing this has been beneficial for portability to platforms such
as Pulley.
* Currently the no_std implementation of Wasmtime requires embedders to
provide `wasmtime_{setjmp,longjmp}` which is a thorn in the side of
what is otherwise a mostly entirely independent implementation of
Wasmtime.
* There is a performance floor to using `setjmp` and `longjmp`. Calling
`setjmp` requires using C but Wasmtime is otherwise written in Rust
meaning that there's a Rust->C->Rust->Wasm boundary which
fundamentally can't be inlined without cross-language LTO which is
difficult to configure.
* With the implementation of the WebAssembly exceptions proposal
Wasmtime now has two means of unwinding the stack. Ideally Wasmtime
would only have one, and the more general one is the method of
exceptions.
* Jumping out of a signal handler on Unix is tricky business. While
we've made it work it's generally most robust of the signal handler
simply returns which it now does.
With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.
One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in `VMStoreContext` so looking up
the current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a `Handler` are stored inline.
Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as bytecodealliance#11570, bytecodealliance#11585,
and bytecodealliance#11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.
Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
`ucontext_t` with register values that the handler specifies. Windows
involves updating similar contexts, and macOS mach ports ended up not
needing too many changes.
In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:
sync/no-hook/core - host-to-wasm - typed - nop
time: [10.552 ns 10.561 ns 10.571 ns]
change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05)
Performance has improved.
Closes bytecodealliance#3927
cc bytecodealliance#10923
prtest:full
This commit fixes some fallout of bytecodealliance#11592 where that PR resulted in Wasmtime no longer running within the context of the Go runtime (e.g. `wasmtime-go`). The reasons for this are quite Windows-specific and I've attempted to document the situation in `vectored_exceptions.go`. The basic TL;DR; is that by returning from a vectored exception handler (which bytecodealliance#11592 introduced) we're now subjecting ourselves to "continue handlers" as well, and Go's continue handlers will abort the process for non-Go exceptions. Some logic is added here to try to bypass Go's continue handlers and get back to wasm.
* Fix compatibility with the Go runtime on Windows for exceptions This commit fixes some fallout of #11592 where that PR resulted in Wasmtime no longer running within the context of the Go runtime (e.g. `wasmtime-go`). The reasons for this are quite Windows-specific and I've attempted to document the situation in `vectored_exceptions.go`. The basic TL;DR; is that by returning from a vectored exception handler (which #11592 introduced) we're now subjecting ourselves to "continue handlers" as well, and Go's continue handlers will abort the process for non-Go exceptions. Some logic is added here to try to bypass Go's continue handlers and get back to wasm. * Fix typos * Review comments
…codealliance#11892) * Fix compatibility with the Go runtime on Windows for exceptions This commit fixes some fallout of bytecodealliance#11592 where that PR resulted in Wasmtime no longer running within the context of the Go runtime (e.g. `wasmtime-go`). The reasons for this are quite Windows-specific and I've attempted to document the situation in `vectored_exceptions.go`. The basic TL;DR; is that by returning from a vectored exception handler (which bytecodealliance#11592 introduced) we're now subjecting ourselves to "continue handlers" as well, and Go's continue handlers will abort the process for non-Go exceptions. Some logic is added here to try to bypass Go's continue handlers and get back to wasm. * Fix typos * Review comments
…) (#11900) * Fix compatibility with the Go runtime on Windows for exceptions This commit fixes some fallout of #11592 where that PR resulted in Wasmtime no longer running within the context of the Go runtime (e.g. `wasmtime-go`). The reasons for this are quite Windows-specific and I've attempted to document the situation in `vectored_exceptions.go`. The basic TL;DR; is that by returning from a vectored exception handler (which #11592 introduced) we're now subjecting ourselves to "continue handlers" as well, and Go's continue handlers will abort the process for non-Go exceptions. Some logic is added here to try to bypass Go's continue handlers and get back to wasm. * Fix typos * Review comments
This commit updates the implementation of compiling array-to-wasm trampolines for component intrinsics to reuse the exact same implementation as core wasm uses. This fixes an issue where the component trampolines were not updated as part of bytecodealliance#11592 to have a try/catch for errors that happen during their execution. The implementation here is intended to be a small, backportable, patch to the 38.0.x release branch. This does not refactor, for example, `TrampolineCompiler` which now always uses the `Wasm` ABI as opposed to using either the wasm or array ABI. Such cleanup is left for a follow-up PR to `main` after this one. In the meantime though the implementation of array-ABI component model intrinsics now looks exactly like array-to-wasm trampolines for core wasm where the array-ABI function performs a `try_call` to the wasm-ABI function, letting the wasm-ABI function doing the actual work. This is a nice simplification for trampolines where the definition of the trampoline is now just in one function instead of duplicated across two.
This commit updates the implementation of compiling array-to-wasm trampolines for component intrinsics to reuse the exact same implementation as core wasm uses. This fixes an issue where the component trampolines were not updated as part of bytecodealliance#11592 to have a try/catch for errors that happen during their execution. The implementation here is intended to be a small, backportable, patch to the 38.0.x release branch. This does not refactor, for example, `TrampolineCompiler` which now always uses the `Wasm` ABI as opposed to using either the wasm or array ABI. Such cleanup is left for a follow-up PR to `main` after this one. In the meantime though the implementation of array-ABI component model intrinsics now looks exactly like array-to-wasm trampolines for core wasm where the array-ABI function performs a `try_call` to the wasm-ABI function, letting the wasm-ABI function doing the actual work. This is a nice simplification for trampolines where the definition of the trampoline is now just in one function instead of duplicated across two.
This commit updates the implementation of compiling array-to-wasm trampolines for component intrinsics to reuse the exact same implementation as core wasm uses. This fixes an issue where the component trampolines were not updated as part of #11592 to have a try/catch for errors that happen during their execution. The implementation here is intended to be a small, backportable, patch to the 38.0.x release branch. This does not refactor, for example, `TrampolineCompiler` which now always uses the `Wasm` ABI as opposed to using either the wasm or array ABI. Such cleanup is left for a follow-up PR to `main` after this one. In the meantime though the implementation of array-ABI component model intrinsics now looks exactly like array-to-wasm trampolines for core wasm where the array-ABI function performs a `try_call` to the wasm-ABI function, letting the wasm-ABI function doing the actual work. This is a nice simplification for trampolines where the definition of the trampoline is now just in one function instead of duplicated across two.
This commit updates the implementation of compiling array-to-wasm trampolines for component intrinsics to reuse the exact same implementation as core wasm uses. This fixes an issue where the component trampolines were not updated as part of #11592 to have a try/catch for errors that happen during their execution. The implementation here is intended to be a small, backportable, patch to the 38.0.x release branch. This does not refactor, for example, `TrampolineCompiler` which now always uses the `Wasm` ABI as opposed to using either the wasm or array ABI. Such cleanup is left for a follow-up PR to `main` after this one. In the meantime though the implementation of array-ABI component model intrinsics now looks exactly like array-to-wasm trampolines for core wasm where the array-ABI function performs a `try_call` to the wasm-ABI function, letting the wasm-ABI function doing the actual work. This is a nice simplification for trampolines where the definition of the trampoline is now just in one function instead of duplicated across two.
Since Wasmtime's inception it's used the
setjmpandlongjmpfunctions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:
Using
setjmpfundamentally requires using C because Rust does notunderstand a function that returns twice. It's fundamentally unsound
to invoke
setjmpin Rust meaning that Wasmtime has forever needed aC compiler configured and set up to build. This notably means that
cargo checkcannot check other targets easily.Using
longjmpmeans that Rust function frames are unwound on thestack without running destructors. This is a dangerous operation of
which we get no protection from the compiler about. Both frames
entering wasm and frames exiting wasm are all skipped. Absolutely
minimizing this has been beneficial for portability to platforms such
as Pulley.
Currently the no_std implementation of Wasmtime requires embedders to
provide
wasmtime_{setjmp,longjmp}which is a thorn in the side ofwhat is otherwise a mostly entirely independent implementation of
Wasmtime.
There is a performance floor to using
setjmpandlongjmp. Callingsetjmprequires using C but Wasmtime is otherwise written in Rustmeaning that there's a Rust->C->Rust->Wasm boundary which
fundamentally can't be inlined without cross-language LTO which is
difficult to configure.
With the implementation of the WebAssembly exceptions proposal
Wasmtime now has two means of unwinding the stack. Ideally Wasmtime
would only have one, and the more general one is the method of
exceptions.
Jumping out of a signal handler on Unix is tricky business. While
we've made it work it's generally most robust of the signal handler
simply returns which it now does.
With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.
One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in
VMStoreContextso looking upthe current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a
Handlerare stored inline.Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as #11570, #11585,
and #11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.
Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
ucontext_twith register values that the handler specifies. Windowsinvolves updating similar contexts, and macOS mach ports ended up not
needing too many changes.
In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:
Closes #3927
cc #10923