Skip to content

Conversation

@alexcrichton
Copy link
Member

Since Wasmtime's inception it's used the setjmp and longjmp
functions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:

  • Using setjmp fundamentally requires using C because Rust does not
    understand a function that returns twice. It's fundamentally unsound
    to invoke setjmp in Rust meaning that Wasmtime has forever needed a
    C compiler configured and set up to build. This notably means that
    cargo check cannot check other targets easily.

  • Using longjmp means that Rust function frames are unwound on the
    stack without running destructors. This is a dangerous operation of
    which we get no protection from the compiler about. Both frames
    entering wasm and frames exiting wasm are all skipped. Absolutely
    minimizing this has been beneficial for portability to platforms such
    as Pulley.

  • Currently the no_std implementation of Wasmtime requires embedders to
    provide wasmtime_{setjmp,longjmp} which is a thorn in the side of
    what is otherwise a mostly entirely independent implementation of
    Wasmtime.

  • There is a performance floor to using setjmp and longjmp. Calling
    setjmp requires using C but Wasmtime is otherwise written in Rust
    meaning that there's a Rust->C->Rust->Wasm boundary which
    fundamentally can't be inlined without cross-language LTO which is
    difficult to configure.

  • With the implementation of the WebAssembly exceptions proposal
    Wasmtime now has two means of unwinding the stack. Ideally Wasmtime
    would only have one, and the more general one is the method of
    exceptions.

  • Jumping out of a signal handler on Unix is tricky business. While
    we've made it work it's generally most robust of the signal handler
    simply returns which it now does.

With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.

One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in VMStoreContext so looking up
the current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a Handler are stored inline.

Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as #11570, #11585,
and #11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.

Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
ucontext_t with register values that the handler specifies. Windows
involves updating similar contexts, and macOS mach ports ended up not
needing too many changes.

In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:

sync/no-hook/core - host-to-wasm - typed - nop
		    time:   [10.552 ns 10.561 ns 10.571 ns]
		    change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05)
		    Performance has improved.

Closes #3927
cc #10923

@alexcrichton alexcrichton requested review from a team as code owners September 2, 2025 19:28
@alexcrichton alexcrichton requested review from abrown and dicej and removed request for a team September 2, 2025 19:28
@alexcrichton alexcrichton marked this pull request as draft September 2, 2025 19:28
@alexcrichton
Copy link
Member Author

alexcrichton commented Sep 2, 2025

Procedurally this is stacked on #11585, #11577, and #11576 at this time. I expect this to have a bit of a gauntlet on CI, however, so I wanted to get working on that sooner rather than later.

alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Sep 2, 2025
This commit fixes a mistake with our inline assembly for resumption of
an exception on various platforms. This was detected during the
development of bytecodealliance#11592 for riscv64 but I believe this affects other
platforms too. The basic issue is that our inline assembly blocks are
all clobbering the frame pointer because that's what wasm uses but we
have no constraint against preventing any input to these inline assembly
blocks from being allocated into the frame pointer. This means that if
the destination to jump to is allocated to the frame pointer register
then we'll jump to wasm's old frame pointer, no the actual destination,
because the frame pointer register is clobbered before jumping. An
example of this for riscv64 is on [godbolt] where the `s0` register, the
frame pointer on riscv64, is clobbered and then jumped to.

The fix in this PR is to manually allocate all registers. All input
operands are allocated to explicit registers rather than letting the
compiler pick which register they're in. This ensures no overlap with
the frame pointer and fixes the test in question. Note that s390x isn't
updated here as it doesn't have a frame pointer.

[godbolt]: https://godbolt.org/z/E9vWb9coq
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Sep 2, 2025
This commit fixes a mistake with our inline assembly for resumption of
an exception on various platforms. This was detected during the
development of bytecodealliance#11592 for riscv64 but I believe this affects other
platforms too. The basic issue is that our inline assembly blocks are
all clobbering the frame pointer because that's what wasm uses but we
have no constraint against preventing any input to these inline assembly
blocks from being allocated into the frame pointer. This means that if
the destination to jump to is allocated to the frame pointer register
then we'll jump to wasm's old frame pointer, no the actual destination,
because the frame pointer register is clobbered before jumping. An
example of this for riscv64 is on [godbolt] where the `s0` register, the
frame pointer on riscv64, is clobbered and then jumped to.

The fix in this PR is to manually allocate all registers. All input
operands are allocated to explicit registers rather than letting the
compiler pick which register they're in. This ensures no overlap with
the frame pointer and fixes the test in question. Note that s390x isn't
updated here as it doesn't have a frame pointer.

[godbolt]: https://godbolt.org/z/E9vWb9coq
github-merge-queue bot pushed a commit that referenced this pull request Sep 3, 2025
This commit fixes a mistake with our inline assembly for resumption of
an exception on various platforms. This was detected during the
development of #11592 for riscv64 but I believe this affects other
platforms too. The basic issue is that our inline assembly blocks are
all clobbering the frame pointer because that's what wasm uses but we
have no constraint against preventing any input to these inline assembly
blocks from being allocated into the frame pointer. This means that if
the destination to jump to is allocated to the frame pointer register
then we'll jump to wasm's old frame pointer, no the actual destination,
because the frame pointer register is clobbered before jumping. An
example of this for riscv64 is on [godbolt] where the `s0` register, the
frame pointer on riscv64, is clobbered and then jumped to.

The fix in this PR is to manually allocate all registers. All input
operands are allocated to explicit registers rather than letting the
compiler pick which register they're in. This ensures no overlap with
the frame pointer and fixes the test in question. Note that s390x isn't
updated here as it doesn't have a frame pointer.

[godbolt]: https://godbolt.org/z/E9vWb9coq
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen wasmtime:api Related to the API of the `wasmtime` crate itself labels Sep 3, 2025
@alexcrichton alexcrichton force-pushed the trampoline-try-call branch 3 times, most recently from 8e5f360 to ff016d8 Compare September 3, 2025 20:59
pub fn supports_exceptions(&self) -> bool {
match self {
CallConv::Tail | CallConv::SystemV => true,
CallConv::Tail | CallConv::SystemV | CallConv::Winch => true,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For review this is particularly noteworthy, I had to enable this because when we compile for Winch now the entry trampolines (system ABI) are calling a Winch-defined function (winch ABI). The try_call being done requires that the callee ABI (here: Winch) needs to support exceptions.

I don't fully understand the implications of this flag, but I updated the aarch64/x64 backends to explicitly say that exceptions + winch clobbers all registers (like tail + winch). I'm not sure if there's more that needs be done, however.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only used in the verifier, so I think this is fine practically. Semantically, we more or less have defined exception throws in the Winch ABI by virtue of adding the clobber and payload definitions -- so we now "support" it, even though the Winch-the-compiler doesn't yet have lowerings for any of the opcodes.

&& record.ExceptionCode != EXCEPTION_INT_OVERFLOW
{
return ExceptionContinueSearch;
return EXCEPTION_CONTINUE_SEARCH;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For review this is me from 3 years ago causing pain for myself. Years ago we switched from the winapi crate to the windows-sys crate. During that transition I found that the windows-sys crate, at the time, did not have EXCEPTION_CONTINUE_{SEARCH,EXECUTION} defined. It did, however, have ExceptionContinue{Search,Execution}. Being the naive little flower I am I assumed that this was some sort of mistake in the bindings and the values were all the same. Turns out this has always been wrong but it hasn't mattered since we practically never used ExceptionContinueExecution (only used for embedding-handled signals, which folks do sometimes on Linux but basically never on Windows).

Turns out that ExceptionContinue{Search,Execution} have different values than EXCEPTION_CONTINUE_{SEARCH,EXECUTION} and they're both interpreted as "continue search". It's basically a bit of a miracle this never broke before now. In this PR though we're actually using EXCEPTION_CONTINUE_EXECUTION which required this change.

br#"
[optimize]
opt-level = 2
regalloc-algorithm = "single-pass"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another consequence of this PR, the single-pass register allocator effectively no longer works with Wasmtime because it has issues with exceptions and Wasmtime unconditionally uses try_call for trampolines. This manifested in tests as crashes so I've disabled the usage in various tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of trampolines getting significantly larger than before. It might worthwhile to invest some effort in making a new set of trampolines where we have something like all static entrypoints have their normal trampoline signature they have today, but internally they dispatch to a signature-specific trampoline which takes the function-to-call as a function pointer. That way we'd effectively deduplicate trampolines at least per-signature as we used to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is to double-check that we don't accidentally ever run exception handlers on traps now that we're almost implementing traps with exceptions.

@github-actions github-actions bot added the wasmtime:config Issues related to the configuration of Wasmtime label Sep 3, 2025
@github-actions
Copy link

github-actions bot commented Sep 3, 2025

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:

  • If you added a new Config method, you wrote extensive documentation for
    it.

    Our documentation should be of the following form:

    Short, simple summary sentence.
    
    More details. These details can be multiple paragraphs. There should be
    information about not just the method, but its parameters and results as
    well.
    
    Is this method fallible? If so, when can it return an error?
    
    Can this method panic? If so, when does it panic?
    
    # Example
    
    Optional example here.
    
  • If you added a new Config method, or modified an existing one, you
    ensured that this configuration is exercised by the fuzz targets.

    For example, if you expose a new strategy for allocating the next instance
    slot inside the pooling allocator, you should ensure that at least one of our
    fuzz targets exercises that new strategy.

    Often, all that is required of you is to ensure that there is a knob for this
    configuration option in wasmtime_fuzzing::Config (or one
    of its nested structs).

    Rarely, this may require authoring a new fuzz target to specifically test this
    configuration. See our docs on fuzzing for more details.

  • If you are enabling a configuration option by default, make sure that it
    has been fuzzed for at least two weeks before turning it on by default.


To modify this label's message, edit the .github/label-messager/wasmtime-config.md file.

To add new label messages or remove existing label messages, edit the
.github/label-messager.json configuration file.

Learn more.

@alexcrichton alexcrichton marked this pull request as ready for review September 4, 2025 02:34
@cfallin
Copy link
Member

cfallin commented Sep 5, 2025

Hmm, yeah, if we don't adopt the get_exception_handler_address approach from my branch above, I was imagining a more direct approach where we process the relocs directly as they come out of Cranelift -- the key is not using the deferred-label mechanism, because I am not actually sure that it is correct in all cases (it violates the invariants at least on paper, but right at the end of processing, but there are edge cases that could still possibly happen wrt island insertion).

Basically rather than converting the reloc to a label we would apply the reloc directly into the slice of function body. Given that we know the offset of the start of the function and the offset of the exception handler at that point we should have everything we need I think.

That said, I think it's probably best I finish other backends on my branch and put up a PR for the first-class operator instead -- it's a more straightforward design (even if a slightly heavy lift to add the new instruction format). Will do that shortly!

cfallin added a commit to cfallin/wasmtime that referenced this pull request Sep 5, 2025
This is designed to enable applications such as bytecodealliance#11592 that use
alternative unwinding mechanisms that may not necessarily want to walk a
stack and look up exception tables. The idea is that whenever it would
be valid to resume to an exception handler that is active on the stack,
we can provide the same PC as a first-class runtime value that would be
found in the exception table for the given handler edge. A "custom"
resume step can then use this PC as a resume-point as long as it follows
the relevant exception ABI (i.e.: restore SP, FP, any other saved
registers that the exception ABI specifies, and provide appropriate
payload value(s)).

Handlers are associated with edges out of `try_call`s (or
`try_call_indirect`s); and edges specifically, not blocks, because there
could be multiple out-edges to one block. The instruction thus takes the
block that contains the try-call and an immediate that indexes its
exceptional edges.

This CLIF instruction required a bit of infrastructure to (i) allow
naming raw blocks, not just block calls, as instruction arguments, and
(ii) allow getting the MachLabel for any other lowered block during
lowering. But given that, the lowerings themselves are straightforward
uses of MachBuffer labels to fix-up PC-relative address-loading
instructions (e.g., `LEA` or `ADR` or `AUIPC`+`ADDI`).
cfallin added a commit to cfallin/wasmtime that referenced this pull request Sep 5, 2025
This is designed to enable applications such as bytecodealliance#11592 that use
alternative unwinding mechanisms that may not necessarily want to walk a
stack and look up exception tables. The idea is that whenever it would
be valid to resume to an exception handler that is active on the stack,
we can provide the same PC as a first-class runtime value that would be
found in the exception table for the given handler edge. A "custom"
resume step can then use this PC as a resume-point as long as it follows
the relevant exception ABI (i.e.: restore SP, FP, any other saved
registers that the exception ABI specifies, and provide appropriate
payload value(s)).

Handlers are associated with edges out of `try_call`s (or
`try_call_indirect`s); and edges specifically, not blocks, because there
could be multiple out-edges to one block. The instruction thus takes the
block that contains the try-call and an immediate that indexes its
exceptional edges.

This CLIF instruction required a bit of infrastructure to (i) allow
naming raw blocks, not just block calls, as instruction arguments, and
(ii) allow getting the MachLabel for any other lowered block during
lowering. But given that, the lowerings themselves are straightforward
uses of MachBuffer labels to fix-up PC-relative address-loading
instructions (e.g., `LEA` or `ADR` or `AUIPC`+`ADDI`).
cfallin added a commit to cfallin/wasmtime that referenced this pull request Sep 5, 2025
This is designed to enable applications such as bytecodealliance#11592 that use
alternative unwinding mechanisms that may not necessarily want to walk a
stack and look up exception tables. The idea is that whenever it would
be valid to resume to an exception handler that is active on the stack,
we can provide the same PC as a first-class runtime value that would be
found in the exception table for the given handler edge. A "custom"
resume step can then use this PC as a resume-point as long as it follows
the relevant exception ABI (i.e.: restore SP, FP, any other saved
registers that the exception ABI specifies, and provide appropriate
payload value(s)).

Handlers are associated with edges out of `try_call`s (or
`try_call_indirect`s); and edges specifically, not blocks, because there
could be multiple out-edges to one block. The instruction thus takes the
block that contains the try-call and an immediate that indexes its
exceptional edges.

This CLIF instruction required a bit of infrastructure to (i) allow
naming raw blocks, not just block calls, as instruction arguments, and
(ii) allow getting the MachLabel for any other lowered block during
lowering. But given that, the lowerings themselves are straightforward
uses of MachBuffer labels to fix-up PC-relative address-loading
instructions (e.g., `LEA` or `ADR` or `AUIPC`+`ADDI`).
cfallin added a commit to cfallin/wasmtime that referenced this pull request Sep 5, 2025
This is designed to enable applications such as bytecodealliance#11592 that use
alternative unwinding mechanisms that may not necessarily want to walk a
stack and look up exception tables. The idea is that whenever it would
be valid to resume to an exception handler that is active on the stack,
we can provide the same PC as a first-class runtime value that would be
found in the exception table for the given handler edge. A "custom"
resume step can then use this PC as a resume-point as long as it follows
the relevant exception ABI (i.e.: restore SP, FP, any other saved
registers that the exception ABI specifies, and provide appropriate
payload value(s)).

Handlers are associated with edges out of `try_call`s (or
`try_call_indirect`s); and edges specifically, not blocks, because there
could be multiple out-edges to one block. The instruction thus takes the
block that contains the try-call and an immediate that indexes its
exceptional edges.

This CLIF instruction required a bit of infrastructure to (i) allow
naming raw blocks, not just block calls, as instruction arguments, and
(ii) allow getting the MachLabel for any other lowered block during
lowering. But given that, the lowerings themselves are straightforward
uses of MachBuffer labels to fix-up PC-relative address-loading
instructions (e.g., `LEA` or `ADR` or `AUIPC`+`ADDI`).
github-merge-queue bot pushed a commit that referenced this pull request Sep 5, 2025
* Cranelift: add get_exception_handler_address.

This is designed to enable applications such as #11592 that use
alternative unwinding mechanisms that may not necessarily want to walk a
stack and look up exception tables. The idea is that whenever it would
be valid to resume to an exception handler that is active on the stack,
we can provide the same PC as a first-class runtime value that would be
found in the exception table for the given handler edge. A "custom"
resume step can then use this PC as a resume-point as long as it follows
the relevant exception ABI (i.e.: restore SP, FP, any other saved
registers that the exception ABI specifies, and provide appropriate
payload value(s)).

Handlers are associated with edges out of `try_call`s (or
`try_call_indirect`s); and edges specifically, not blocks, because there
could be multiple out-edges to one block. The instruction thus takes the
block that contains the try-call and an immediate that indexes its
exceptional edges.

This CLIF instruction required a bit of infrastructure to (i) allow
naming raw blocks, not just block calls, as instruction arguments, and
(ii) allow getting the MachLabel for any other lowered block during
lowering. But given that, the lowerings themselves are straightforward
uses of MachBuffer labels to fix-up PC-relative address-loading
instructions (e.g., `LEA` or `ADR` or `AUIPC`+`ADDI`).

* Review feedback.

* Review feedback: more tests.
Since Wasmtime's inception it's used the `setjmp` and `longjmp`
functions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:

* Using `setjmp` fundamentally requires using C because Rust does not
  understand a function that returns twice. It's fundamentally unsound
  to invoke `setjmp` in Rust meaning that Wasmtime has forever needed a
  C compiler configured and set up to build. This notably means that
  `cargo check` cannot check other targets easily.

* Using `longjmp` means that Rust function frames are unwound on the
  stack without running destructors. This is a dangerous operation of
  which we get no protection from the compiler about. Both frames
  entering wasm and frames exiting wasm are all skipped. Absolutely
  minimizing this has been beneficial for portability to platforms such
  as Pulley.

* Currently the no_std implementation of Wasmtime requires embedders to
  provide `wasmtime_{setjmp,longjmp}` which is a thorn in the side of
  what is otherwise a mostly entirely independent implementation of
  Wasmtime.

* There is a performance floor to using `setjmp` and `longjmp`. Calling
  `setjmp` requires using C but Wasmtime is otherwise written in Rust
  meaning that there's a Rust->C->Rust->Wasm boundary which
  fundamentally can't be inlined without cross-language LTO which is
  difficult to configure.

* With the implementation of the WebAssembly exceptions proposal
  Wasmtime now has two means of unwinding the stack. Ideally Wasmtime
  would only have one, and the more general one is the method of
  exceptions.

* Jumping out of a signal handler on Unix is tricky business. While
  we've made it work it's generally most robust of the signal handler
  simply returns which it now does.

With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.

One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in `VMStoreContext` so looking up
the current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a `Handler` are stored inline.

Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as bytecodealliance#11570, bytecodealliance#11585,
and bytecodealliance#11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.

Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
`ucontext_t` with register values that the handler specifies. Windows
involves updating similar contexts, and macOS mach ports ended up not
needing too many changes.

In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:

    sync/no-hook/core - host-to-wasm - typed - nop
			    time:   [10.552 ns 10.561 ns 10.571 ns]
			    change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05)
			    Performance has improved.

Closes bytecodealliance#3927
cc bytecodealliance#10923

prtest:full
@alexcrichton
Copy link
Member Author

Updated now to incorporate the work from #11629

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- thanks for the patience!

@github-actions github-actions bot added the wasmtime:docs Issues related to Wasmtime's documentation label Sep 8, 2025
@alexcrichton alexcrichton added this pull request to the merge queue Sep 8, 2025
Merged via the queue into bytecodealliance:main with commit 192f2fc Sep 8, 2025
168 checks passed
@alexcrichton alexcrichton deleted the trampoline-try-call branch September 8, 2025 14:57
bongjunj pushed a commit to prosyslab/wasmtime that referenced this pull request Oct 20, 2025
This commit fixes a mistake with our inline assembly for resumption of
an exception on various platforms. This was detected during the
development of bytecodealliance#11592 for riscv64 but I believe this affects other
platforms too. The basic issue is that our inline assembly blocks are
all clobbering the frame pointer because that's what wasm uses but we
have no constraint against preventing any input to these inline assembly
blocks from being allocated into the frame pointer. This means that if
the destination to jump to is allocated to the frame pointer register
then we'll jump to wasm's old frame pointer, no the actual destination,
because the frame pointer register is clobbered before jumping. An
example of this for riscv64 is on [godbolt] where the `s0` register, the
frame pointer on riscv64, is clobbered and then jumped to.

The fix in this PR is to manually allocate all registers. All input
operands are allocated to explicit registers rather than letting the
compiler pick which register they're in. This ensures no overlap with
the frame pointer and fixes the test in question. Note that s390x isn't
updated here as it doesn't have a frame pointer.

[godbolt]: https://godbolt.org/z/E9vWb9coq
bongjunj pushed a commit to prosyslab/wasmtime that referenced this pull request Oct 20, 2025
* Cranelift: add get_exception_handler_address.

This is designed to enable applications such as bytecodealliance#11592 that use
alternative unwinding mechanisms that may not necessarily want to walk a
stack and look up exception tables. The idea is that whenever it would
be valid to resume to an exception handler that is active on the stack,
we can provide the same PC as a first-class runtime value that would be
found in the exception table for the given handler edge. A "custom"
resume step can then use this PC as a resume-point as long as it follows
the relevant exception ABI (i.e.: restore SP, FP, any other saved
registers that the exception ABI specifies, and provide appropriate
payload value(s)).

Handlers are associated with edges out of `try_call`s (or
`try_call_indirect`s); and edges specifically, not blocks, because there
could be multiple out-edges to one block. The instruction thus takes the
block that contains the try-call and an immediate that indexes its
exceptional edges.

This CLIF instruction required a bit of infrastructure to (i) allow
naming raw blocks, not just block calls, as instruction arguments, and
(ii) allow getting the MachLabel for any other lowered block during
lowering. But given that, the lowerings themselves are straightforward
uses of MachBuffer labels to fix-up PC-relative address-loading
instructions (e.g., `LEA` or `ADR` or `AUIPC`+`ADDI`).

* Review feedback.

* Review feedback: more tests.
bongjunj pushed a commit to prosyslab/wasmtime that referenced this pull request Oct 20, 2025
Since Wasmtime's inception it's used the `setjmp` and `longjmp`
functions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:

* Using `setjmp` fundamentally requires using C because Rust does not
  understand a function that returns twice. It's fundamentally unsound
  to invoke `setjmp` in Rust meaning that Wasmtime has forever needed a
  C compiler configured and set up to build. This notably means that
  `cargo check` cannot check other targets easily.

* Using `longjmp` means that Rust function frames are unwound on the
  stack without running destructors. This is a dangerous operation of
  which we get no protection from the compiler about. Both frames
  entering wasm and frames exiting wasm are all skipped. Absolutely
  minimizing this has been beneficial for portability to platforms such
  as Pulley.

* Currently the no_std implementation of Wasmtime requires embedders to
  provide `wasmtime_{setjmp,longjmp}` which is a thorn in the side of
  what is otherwise a mostly entirely independent implementation of
  Wasmtime.

* There is a performance floor to using `setjmp` and `longjmp`. Calling
  `setjmp` requires using C but Wasmtime is otherwise written in Rust
  meaning that there's a Rust->C->Rust->Wasm boundary which
  fundamentally can't be inlined without cross-language LTO which is
  difficult to configure.

* With the implementation of the WebAssembly exceptions proposal
  Wasmtime now has two means of unwinding the stack. Ideally Wasmtime
  would only have one, and the more general one is the method of
  exceptions.

* Jumping out of a signal handler on Unix is tricky business. While
  we've made it work it's generally most robust of the signal handler
  simply returns which it now does.

With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.

One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in `VMStoreContext` so looking up
the current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a `Handler` are stored inline.

Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as bytecodealliance#11570, bytecodealliance#11585,
and bytecodealliance#11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.

Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
`ucontext_t` with register values that the handler specifies. Windows
involves updating similar contexts, and macOS mach ports ended up not
needing too many changes.

In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:

    sync/no-hook/core - host-to-wasm - typed - nop
			    time:   [10.552 ns 10.561 ns 10.571 ns]
			    change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05)
			    Performance has improved.

Closes bytecodealliance#3927
cc bytecodealliance#10923

prtest:full
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Oct 20, 2025
This commit fixes some fallout of bytecodealliance#11592 where that PR resulted in
Wasmtime no longer running within the context of the Go runtime (e.g.
`wasmtime-go`). The reasons for this are quite Windows-specific and
I've attempted to document the situation in `vectored_exceptions.go`.
The basic TL;DR; is that by returning from a vectored exception handler
(which bytecodealliance#11592 introduced) we're now subjecting ourselves to "continue
handlers" as well, and Go's continue handlers will abort the process for
non-Go exceptions. Some logic is added here to try to bypass Go's
continue handlers and get back to wasm.
github-merge-queue bot pushed a commit that referenced this pull request Oct 21, 2025
* Fix compatibility with the Go runtime on Windows for exceptions

This commit fixes some fallout of #11592 where that PR resulted in
Wasmtime no longer running within the context of the Go runtime (e.g.
`wasmtime-go`). The reasons for this are quite Windows-specific and
I've attempted to document the situation in `vectored_exceptions.go`.
The basic TL;DR; is that by returning from a vectored exception handler
(which #11592 introduced) we're now subjecting ourselves to "continue
handlers" as well, and Go's continue handlers will abort the process for
non-Go exceptions. Some logic is added here to try to bypass Go's
continue handlers and get back to wasm.

* Fix typos

* Review comments
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Oct 21, 2025
…codealliance#11892)

* Fix compatibility with the Go runtime on Windows for exceptions

This commit fixes some fallout of bytecodealliance#11592 where that PR resulted in
Wasmtime no longer running within the context of the Go runtime (e.g.
`wasmtime-go`). The reasons for this are quite Windows-specific and
I've attempted to document the situation in `vectored_exceptions.go`.
The basic TL;DR; is that by returning from a vectored exception handler
(which bytecodealliance#11592 introduced) we're now subjecting ourselves to "continue
handlers" as well, and Go's continue handlers will abort the process for
non-Go exceptions. Some logic is added here to try to bypass Go's
continue handlers and get back to wasm.

* Fix typos

* Review comments
alexcrichton added a commit that referenced this pull request Oct 21, 2025
…) (#11900)

* Fix compatibility with the Go runtime on Windows for exceptions

This commit fixes some fallout of #11592 where that PR resulted in
Wasmtime no longer running within the context of the Go runtime (e.g.
`wasmtime-go`). The reasons for this are quite Windows-specific and
I've attempted to document the situation in `vectored_exceptions.go`.
The basic TL;DR; is that by returning from a vectored exception handler
(which #11592 introduced) we're now subjecting ourselves to "continue
handlers" as well, and Go's continue handlers will abort the process for
non-Go exceptions. Some logic is added here to try to bypass Go's
continue handlers and get back to wasm.

* Fix typos

* Review comments
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Oct 24, 2025
This commit updates the implementation of compiling array-to-wasm
trampolines for component intrinsics to reuse the exact same
implementation as core wasm uses. This fixes an issue where the
component trampolines were not updated as part of bytecodealliance#11592 to have a
try/catch for errors that happen during their execution.

The implementation here is intended to be a small, backportable, patch
to the 38.0.x release branch. This does not refactor, for example,
`TrampolineCompiler` which now always uses the `Wasm` ABI as opposed to
using either the wasm or array ABI. Such cleanup is left for a follow-up
PR to `main` after this one.

In the meantime though the implementation of array-ABI component model
intrinsics now looks exactly like array-to-wasm trampolines for core
wasm where the array-ABI function performs a `try_call` to the wasm-ABI
function, letting the wasm-ABI function doing the actual work. This is a
nice simplification for trampolines where the definition of the
trampoline is now just in one function instead of duplicated across two.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Oct 24, 2025
This commit updates the implementation of compiling array-to-wasm
trampolines for component intrinsics to reuse the exact same
implementation as core wasm uses. This fixes an issue where the
component trampolines were not updated as part of bytecodealliance#11592 to have a
try/catch for errors that happen during their execution.

The implementation here is intended to be a small, backportable, patch
to the 38.0.x release branch. This does not refactor, for example,
`TrampolineCompiler` which now always uses the `Wasm` ABI as opposed to
using either the wasm or array ABI. Such cleanup is left for a follow-up
PR to `main` after this one.

In the meantime though the implementation of array-ABI component model
intrinsics now looks exactly like array-to-wasm trampolines for core
wasm where the array-ABI function performs a `try_call` to the wasm-ABI
function, letting the wasm-ABI function doing the actual work. This is a
nice simplification for trampolines where the definition of the
trampoline is now just in one function instead of duplicated across two.
alexcrichton added a commit that referenced this pull request Oct 24, 2025
This commit updates the implementation of compiling array-to-wasm
trampolines for component intrinsics to reuse the exact same
implementation as core wasm uses. This fixes an issue where the
component trampolines were not updated as part of #11592 to have a
try/catch for errors that happen during their execution.

The implementation here is intended to be a small, backportable, patch
to the 38.0.x release branch. This does not refactor, for example,
`TrampolineCompiler` which now always uses the `Wasm` ABI as opposed to
using either the wasm or array ABI. Such cleanup is left for a follow-up
PR to `main` after this one.

In the meantime though the implementation of array-ABI component model
intrinsics now looks exactly like array-to-wasm trampolines for core
wasm where the array-ABI function performs a `try_call` to the wasm-ABI
function, letting the wasm-ABI function doing the actual work. This is a
nice simplification for trampolines where the definition of the
trampoline is now just in one function instead of duplicated across two.
github-merge-queue bot pushed a commit that referenced this pull request Oct 24, 2025
This commit updates the implementation of compiling array-to-wasm
trampolines for component intrinsics to reuse the exact same
implementation as core wasm uses. This fixes an issue where the
component trampolines were not updated as part of #11592 to have a
try/catch for errors that happen during their execution.

The implementation here is intended to be a small, backportable, patch
to the 38.0.x release branch. This does not refactor, for example,
`TrampolineCompiler` which now always uses the `Wasm` ABI as opposed to
using either the wasm or array ABI. Such cleanup is left for a follow-up
PR to `main` after this one.

In the meantime though the implementation of array-ABI component model
intrinsics now looks exactly like array-to-wasm trampolines for core
wasm where the array-ABI function performs a `try_call` to the wasm-ABI
function, letting the wasm-ABI function doing the actual work. This is a
nice simplification for trampolines where the definition of the
trampoline is now just in one function instead of duplicated across two.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:config Issues related to the configuration of Wasmtime wasmtime:docs Issues related to Wasmtime's documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fold setjmp into Cranelift-generated entry trampolines to WebAssembly

3 participants