Skip to content

Conversation

@alexcrichton
Copy link
Member

The current aarch64 backend does not support symbol_value to get the value of a function, for example, with a "near" relocation using a relative relocation. Currently it uses an Abs8 relocation which means that it's not suitable in Wasmtime, for example.

This commit refactors relocation/external name support in the aarch64 backend to support this mode of relocation. The previous LoadExtName was split into LoadExtName{Got,Near,Far} where the "near" bit is what's new to the backend. The preexisting symbol-value.clif-style tests were updated to match the x64 backend which has a more comprehensive suite of examples of what it looks like to refer to various symbols.

The goal of this commit is to enable Wasmtime to generate code which refers to a relative point elsewhere in the code (e.g. an exception handler) and load the value into a register. This part isn't filled out yet, but it seemed good to at least in the meantime fill out these missing relocations in the backend.

The current aarch64 backend does not support `symbol_value` to get the
value of a function, for example, with a "near" relocation using a
relative relocation. Currently it uses an `Abs8` relocation which means
that it's not suitable in Wasmtime, for example.

This commit refactors relocation/external name support in the aarch64
backend to support this mode of relocation. The previous `LoadExtName`
was split into `LoadExtName{Got,Near,Far}` where the "near" bit is
what's new to the backend. The preexisting `symbol-value.clif`-style
tests were updated to match the x64 backend which has a more
comprehensive suite of examples of what it looks like to refer to
various symbols.

The goal of this commit is to enable Wasmtime to generate code which
refers to a relative point elsewhere in the code (e.g. an exception
handler) and load the value into a register. This part isn't filled out
yet, but it seemed good to at least in the meantime fill out these
missing relocations in the backend.
@alexcrichton alexcrichton requested a review from a team as a code owner August 29, 2025 18:06
@alexcrichton alexcrichton requested review from cfallin and removed request for a team August 29, 2025 18:06
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This definitely makes sense to have.


; VCode:
; stp fp, lr, [sp, #-16]!
; mov fp, sp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of curious that the leaf-function optimization is no longer happening here, and we're getting a frame now -- won't matter for Wasmtime since we always force frames, but is there a reason you're aware of that this is happening?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On looking a bit further, it seems that Function::is_leaf only looks to see if there are signatures -- so func_addr triggers the "not a leaf function" mode and forces the frame.

Mind filing an issue that we should probably determine leaf-ness by scanning VCode instead for any instructions that claim to be calls? It should be machine-dependent anyway since libcalls can happen even in functions without IR-level calls. (I suppose that is_leaf and the no-frame ABI optimization isn't even right, in that case -- but we get away with it because the aarch64 backend doesn't fall back to any libcalls for FP stuff, and also because this configuration isn't exposed to Wasmtime?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing! #11573

And yeah AFAIK we're ok in Wasmtime, but I''m not 100% sure in that judgment. Your reasoning sounds reasonable to me, however

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen isle Related to the ISLE domain-specific language labels Aug 29, 2025
@github-actions
Copy link

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64", "isle"

Thus the following users have been cc'd because of the following labels:

  • cfallin: isle
  • fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@alexcrichton alexcrichton enabled auto-merge August 30, 2025 00:06
@alexcrichton alexcrichton added this pull request to the merge queue Aug 30, 2025
Merged via the queue into bytecodealliance:main with commit 10d2cbc Aug 30, 2025
146 of 148 checks passed
@alexcrichton alexcrichton deleted the aarch64-near-symbol branch August 30, 2025 00:34
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Sep 2, 2025
This is the same as bytecodealliance#11570 but for the riscv64 backend. The intention is
to support "near" relocations which don't require `Abs8` relocations for
upcoming use in Wasmtime. The same design as bytecodealliance#11570 is used here.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Sep 2, 2025
This is the same as bytecodealliance#11570 but for the riscv64 backend. The intention is
to support "near" relocations which don't require `Abs8` relocations for
upcoming use in Wasmtime. The same design as bytecodealliance#11570 is used here.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Sep 2, 2025
Since Wasmtime's inception it's used the `setjmp` and `longjmp`
functions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:

* Using `setjmp` fundamentally requires using C because Rust does not
  understand a function that returns twice. It's fundamentally unsound
  to invoke `setjmp` in Rust meaning that Wasmtime has forever needed a
  C compiler configured and set up to build. This notably means that
  `cargo check` cannot check other targets easily.

* Using `longjmp` means that Rust function frames are unwound on the
  stack without running destructors. This is a dangerous operation of
  which we get no protection from the compiler about. Both frames
  entering wasm and frames exiting wasm are all skipped. Absolutely
  minimizing this has been beneficial for portability to platforms such
  as Pulley.

* Currently the no_std implementation of Wasmtime requires embedders to
  provide `wasmtime_{setjmp,longjmp}` which is a thorn in the side of
  what is otherwise a mostly entirely independent implementation of
  Wasmtime.

* There is a performance floor to using `setjmp` and `longjmp`. Calling
  `setjmp` requires using C but Wasmtime is otherwise written in Rust
  meaning that there's a Rust->C->Rust->Wasm boundary which
  fundamentally can't be inlined without cross-language LTO which is
  difficult to configure.

* With the implementation of the WebAssembly exceptions proposal
  Wasmtime now has two means of unwinding the stack. Ideally Wasmtime
  would only have one, and the more general one is the method of
  exceptions.

* Jumping out of a signal handler on Unix is tricky business. While
  we've made it work it's generally most robust of the signal handler
  simply returns which it now does.

With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.

One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in `VMStoreContext` so looking up
the current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a `Handler` are stored inline.

Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as bytecodealliance#11570, bytecodealliance#11585,
and bytecodealliance#11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.

Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
`ucontext_t` with register values that the handler specifies. Windows
involves updating similar contexts, and macOS mach ports ended up not
needing too many changes.

In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:

    sync/no-hook/core - host-to-wasm - typed - nop
			    time:   [10.552 ns 10.561 ns 10.571 ns]
			    change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05)
			    Performance has improved.

Closes bytecodealliance#3927
cc bytecodealliance#10923

prtest:full
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Sep 2, 2025
This is the same as bytecodealliance#11570 but for the riscv64 backend. The intention is
to support "near" relocations which don't require `Abs8` relocations for
upcoming use in Wasmtime. The same design as bytecodealliance#11570 is used here.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Sep 2, 2025
This is the same as bytecodealliance#11570 but for the riscv64 backend. The intention is
to support "near" relocations which don't require `Abs8` relocations for
upcoming use in Wasmtime. The same design as bytecodealliance#11570 is used here.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Sep 2, 2025
Since Wasmtime's inception it's used the `setjmp` and `longjmp`
functions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:

* Using `setjmp` fundamentally requires using C because Rust does not
  understand a function that returns twice. It's fundamentally unsound
  to invoke `setjmp` in Rust meaning that Wasmtime has forever needed a
  C compiler configured and set up to build. This notably means that
  `cargo check` cannot check other targets easily.

* Using `longjmp` means that Rust function frames are unwound on the
  stack without running destructors. This is a dangerous operation of
  which we get no protection from the compiler about. Both frames
  entering wasm and frames exiting wasm are all skipped. Absolutely
  minimizing this has been beneficial for portability to platforms such
  as Pulley.

* Currently the no_std implementation of Wasmtime requires embedders to
  provide `wasmtime_{setjmp,longjmp}` which is a thorn in the side of
  what is otherwise a mostly entirely independent implementation of
  Wasmtime.

* There is a performance floor to using `setjmp` and `longjmp`. Calling
  `setjmp` requires using C but Wasmtime is otherwise written in Rust
  meaning that there's a Rust->C->Rust->Wasm boundary which
  fundamentally can't be inlined without cross-language LTO which is
  difficult to configure.

* With the implementation of the WebAssembly exceptions proposal
  Wasmtime now has two means of unwinding the stack. Ideally Wasmtime
  would only have one, and the more general one is the method of
  exceptions.

* Jumping out of a signal handler on Unix is tricky business. While
  we've made it work it's generally most robust of the signal handler
  simply returns which it now does.

With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.

One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in `VMStoreContext` so looking up
the current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a `Handler` are stored inline.

Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as bytecodealliance#11570, bytecodealliance#11585,
and bytecodealliance#11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.

Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
`ucontext_t` with register values that the handler specifies. Windows
involves updating similar contexts, and macOS mach ports ended up not
needing too many changes.

In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:

    sync/no-hook/core - host-to-wasm - typed - nop
			    time:   [10.552 ns 10.561 ns 10.571 ns]
			    change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05)
			    Performance has improved.

Closes bytecodealliance#3927
cc bytecodealliance#10923

prtest:full
github-merge-queue bot pushed a commit that referenced this pull request Sep 2, 2025
* riscv64: Implement near relocations

This is the same as #11570 but for the riscv64 backend. The intention is
to support "near" relocations which don't require `Abs8` relocations for
upcoming use in Wasmtime. The same design as #11570 is used here.

* Review comments
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Sep 2, 2025
Since Wasmtime's inception it's used the `setjmp` and `longjmp`
functions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:

* Using `setjmp` fundamentally requires using C because Rust does not
  understand a function that returns twice. It's fundamentally unsound
  to invoke `setjmp` in Rust meaning that Wasmtime has forever needed a
  C compiler configured and set up to build. This notably means that
  `cargo check` cannot check other targets easily.

* Using `longjmp` means that Rust function frames are unwound on the
  stack without running destructors. This is a dangerous operation of
  which we get no protection from the compiler about. Both frames
  entering wasm and frames exiting wasm are all skipped. Absolutely
  minimizing this has been beneficial for portability to platforms such
  as Pulley.

* Currently the no_std implementation of Wasmtime requires embedders to
  provide `wasmtime_{setjmp,longjmp}` which is a thorn in the side of
  what is otherwise a mostly entirely independent implementation of
  Wasmtime.

* There is a performance floor to using `setjmp` and `longjmp`. Calling
  `setjmp` requires using C but Wasmtime is otherwise written in Rust
  meaning that there's a Rust->C->Rust->Wasm boundary which
  fundamentally can't be inlined without cross-language LTO which is
  difficult to configure.

* With the implementation of the WebAssembly exceptions proposal
  Wasmtime now has two means of unwinding the stack. Ideally Wasmtime
  would only have one, and the more general one is the method of
  exceptions.

* Jumping out of a signal handler on Unix is tricky business. While
  we've made it work it's generally most robust of the signal handler
  simply returns which it now does.

With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.

One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in `VMStoreContext` so looking up
the current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a `Handler` are stored inline.

Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as bytecodealliance#11570, bytecodealliance#11585,
and bytecodealliance#11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.

Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
`ucontext_t` with register values that the handler specifies. Windows
involves updating similar contexts, and macOS mach ports ended up not
needing too many changes.

In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:

    sync/no-hook/core - host-to-wasm - typed - nop
			    time:   [10.552 ns 10.561 ns 10.571 ns]
			    change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05)
			    Performance has improved.

Closes bytecodealliance#3927
cc bytecodealliance#10923

prtest:full
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Sep 3, 2025
Since Wasmtime's inception it's used the `setjmp` and `longjmp`
functions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:

* Using `setjmp` fundamentally requires using C because Rust does not
  understand a function that returns twice. It's fundamentally unsound
  to invoke `setjmp` in Rust meaning that Wasmtime has forever needed a
  C compiler configured and set up to build. This notably means that
  `cargo check` cannot check other targets easily.

* Using `longjmp` means that Rust function frames are unwound on the
  stack without running destructors. This is a dangerous operation of
  which we get no protection from the compiler about. Both frames
  entering wasm and frames exiting wasm are all skipped. Absolutely
  minimizing this has been beneficial for portability to platforms such
  as Pulley.

* Currently the no_std implementation of Wasmtime requires embedders to
  provide `wasmtime_{setjmp,longjmp}` which is a thorn in the side of
  what is otherwise a mostly entirely independent implementation of
  Wasmtime.

* There is a performance floor to using `setjmp` and `longjmp`. Calling
  `setjmp` requires using C but Wasmtime is otherwise written in Rust
  meaning that there's a Rust->C->Rust->Wasm boundary which
  fundamentally can't be inlined without cross-language LTO which is
  difficult to configure.

* With the implementation of the WebAssembly exceptions proposal
  Wasmtime now has two means of unwinding the stack. Ideally Wasmtime
  would only have one, and the more general one is the method of
  exceptions.

* Jumping out of a signal handler on Unix is tricky business. While
  we've made it work it's generally most robust of the signal handler
  simply returns which it now does.

With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.

One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in `VMStoreContext` so looking up
the current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a `Handler` are stored inline.

Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as bytecodealliance#11570, bytecodealliance#11585,
and bytecodealliance#11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.

Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
`ucontext_t` with register values that the handler specifies. Windows
involves updating similar contexts, and macOS mach ports ended up not
needing too many changes.

In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:

    sync/no-hook/core - host-to-wasm - typed - nop
			    time:   [10.552 ns 10.561 ns 10.571 ns]
			    change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05)
			    Performance has improved.

Closes bytecodealliance#3927
cc bytecodealliance#10923

prtest:full
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Sep 3, 2025
Since Wasmtime's inception it's used the `setjmp` and `longjmp`
functions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:

* Using `setjmp` fundamentally requires using C because Rust does not
  understand a function that returns twice. It's fundamentally unsound
  to invoke `setjmp` in Rust meaning that Wasmtime has forever needed a
  C compiler configured and set up to build. This notably means that
  `cargo check` cannot check other targets easily.

* Using `longjmp` means that Rust function frames are unwound on the
  stack without running destructors. This is a dangerous operation of
  which we get no protection from the compiler about. Both frames
  entering wasm and frames exiting wasm are all skipped. Absolutely
  minimizing this has been beneficial for portability to platforms such
  as Pulley.

* Currently the no_std implementation of Wasmtime requires embedders to
  provide `wasmtime_{setjmp,longjmp}` which is a thorn in the side of
  what is otherwise a mostly entirely independent implementation of
  Wasmtime.

* There is a performance floor to using `setjmp` and `longjmp`. Calling
  `setjmp` requires using C but Wasmtime is otherwise written in Rust
  meaning that there's a Rust->C->Rust->Wasm boundary which
  fundamentally can't be inlined without cross-language LTO which is
  difficult to configure.

* With the implementation of the WebAssembly exceptions proposal
  Wasmtime now has two means of unwinding the stack. Ideally Wasmtime
  would only have one, and the more general one is the method of
  exceptions.

* Jumping out of a signal handler on Unix is tricky business. While
  we've made it work it's generally most robust of the signal handler
  simply returns which it now does.

With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.

One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in `VMStoreContext` so looking up
the current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a `Handler` are stored inline.

Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as bytecodealliance#11570, bytecodealliance#11585,
and bytecodealliance#11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.

Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
`ucontext_t` with register values that the handler specifies. Windows
involves updating similar contexts, and macOS mach ports ended up not
needing too many changes.

In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:

    sync/no-hook/core - host-to-wasm - typed - nop
			    time:   [10.552 ns 10.561 ns 10.571 ns]
			    change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05)
			    Performance has improved.

Closes bytecodealliance#3927
cc bytecodealliance#10923

prtest:full
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Sep 3, 2025
Since Wasmtime's inception it's used the `setjmp` and `longjmp`
functions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:

* Using `setjmp` fundamentally requires using C because Rust does not
  understand a function that returns twice. It's fundamentally unsound
  to invoke `setjmp` in Rust meaning that Wasmtime has forever needed a
  C compiler configured and set up to build. This notably means that
  `cargo check` cannot check other targets easily.

* Using `longjmp` means that Rust function frames are unwound on the
  stack without running destructors. This is a dangerous operation of
  which we get no protection from the compiler about. Both frames
  entering wasm and frames exiting wasm are all skipped. Absolutely
  minimizing this has been beneficial for portability to platforms such
  as Pulley.

* Currently the no_std implementation of Wasmtime requires embedders to
  provide `wasmtime_{setjmp,longjmp}` which is a thorn in the side of
  what is otherwise a mostly entirely independent implementation of
  Wasmtime.

* There is a performance floor to using `setjmp` and `longjmp`. Calling
  `setjmp` requires using C but Wasmtime is otherwise written in Rust
  meaning that there's a Rust->C->Rust->Wasm boundary which
  fundamentally can't be inlined without cross-language LTO which is
  difficult to configure.

* With the implementation of the WebAssembly exceptions proposal
  Wasmtime now has two means of unwinding the stack. Ideally Wasmtime
  would only have one, and the more general one is the method of
  exceptions.

* Jumping out of a signal handler on Unix is tricky business. While
  we've made it work it's generally most robust of the signal handler
  simply returns which it now does.

With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.

One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in `VMStoreContext` so looking up
the current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a `Handler` are stored inline.

Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as bytecodealliance#11570, bytecodealliance#11585,
and bytecodealliance#11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.

Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
`ucontext_t` with register values that the handler specifies. Windows
involves updating similar contexts, and macOS mach ports ended up not
needing too many changes.

In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:

    sync/no-hook/core - host-to-wasm - typed - nop
			    time:   [10.552 ns 10.561 ns 10.571 ns]
			    change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05)
			    Performance has improved.

Closes bytecodealliance#3927
cc bytecodealliance#10923

prtest:full
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Sep 4, 2025
Since Wasmtime's inception it's used the `setjmp` and `longjmp`
functions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:

* Using `setjmp` fundamentally requires using C because Rust does not
  understand a function that returns twice. It's fundamentally unsound
  to invoke `setjmp` in Rust meaning that Wasmtime has forever needed a
  C compiler configured and set up to build. This notably means that
  `cargo check` cannot check other targets easily.

* Using `longjmp` means that Rust function frames are unwound on the
  stack without running destructors. This is a dangerous operation of
  which we get no protection from the compiler about. Both frames
  entering wasm and frames exiting wasm are all skipped. Absolutely
  minimizing this has been beneficial for portability to platforms such
  as Pulley.

* Currently the no_std implementation of Wasmtime requires embedders to
  provide `wasmtime_{setjmp,longjmp}` which is a thorn in the side of
  what is otherwise a mostly entirely independent implementation of
  Wasmtime.

* There is a performance floor to using `setjmp` and `longjmp`. Calling
  `setjmp` requires using C but Wasmtime is otherwise written in Rust
  meaning that there's a Rust->C->Rust->Wasm boundary which
  fundamentally can't be inlined without cross-language LTO which is
  difficult to configure.

* With the implementation of the WebAssembly exceptions proposal
  Wasmtime now has two means of unwinding the stack. Ideally Wasmtime
  would only have one, and the more general one is the method of
  exceptions.

* Jumping out of a signal handler on Unix is tricky business. While
  we've made it work it's generally most robust of the signal handler
  simply returns which it now does.

With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.

One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in `VMStoreContext` so looking up
the current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a `Handler` are stored inline.

Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as bytecodealliance#11570, bytecodealliance#11585,
and bytecodealliance#11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.

Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
`ucontext_t` with register values that the handler specifies. Windows
involves updating similar contexts, and macOS mach ports ended up not
needing too many changes.

In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:

    sync/no-hook/core - host-to-wasm - typed - nop
			    time:   [10.552 ns 10.561 ns 10.571 ns]
			    change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05)
			    Performance has improved.

Closes bytecodealliance#3927
cc bytecodealliance#10923

prtest:full
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Sep 4, 2025
Since Wasmtime's inception it's used the `setjmp` and `longjmp`
functions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:

* Using `setjmp` fundamentally requires using C because Rust does not
  understand a function that returns twice. It's fundamentally unsound
  to invoke `setjmp` in Rust meaning that Wasmtime has forever needed a
  C compiler configured and set up to build. This notably means that
  `cargo check` cannot check other targets easily.

* Using `longjmp` means that Rust function frames are unwound on the
  stack without running destructors. This is a dangerous operation of
  which we get no protection from the compiler about. Both frames
  entering wasm and frames exiting wasm are all skipped. Absolutely
  minimizing this has been beneficial for portability to platforms such
  as Pulley.

* Currently the no_std implementation of Wasmtime requires embedders to
  provide `wasmtime_{setjmp,longjmp}` which is a thorn in the side of
  what is otherwise a mostly entirely independent implementation of
  Wasmtime.

* There is a performance floor to using `setjmp` and `longjmp`. Calling
  `setjmp` requires using C but Wasmtime is otherwise written in Rust
  meaning that there's a Rust->C->Rust->Wasm boundary which
  fundamentally can't be inlined without cross-language LTO which is
  difficult to configure.

* With the implementation of the WebAssembly exceptions proposal
  Wasmtime now has two means of unwinding the stack. Ideally Wasmtime
  would only have one, and the more general one is the method of
  exceptions.

* Jumping out of a signal handler on Unix is tricky business. While
  we've made it work it's generally most robust of the signal handler
  simply returns which it now does.

With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.

One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in `VMStoreContext` so looking up
the current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a `Handler` are stored inline.

Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as bytecodealliance#11570, bytecodealliance#11585,
and bytecodealliance#11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.

Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
`ucontext_t` with register values that the handler specifies. Windows
involves updating similar contexts, and macOS mach ports ended up not
needing too many changes.

In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:

    sync/no-hook/core - host-to-wasm - typed - nop
			    time:   [10.552 ns 10.561 ns 10.571 ns]
			    change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05)
			    Performance has improved.

Closes bytecodealliance#3927
cc bytecodealliance#10923

prtest:full
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Sep 7, 2025
Since Wasmtime's inception it's used the `setjmp` and `longjmp`
functions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:

* Using `setjmp` fundamentally requires using C because Rust does not
  understand a function that returns twice. It's fundamentally unsound
  to invoke `setjmp` in Rust meaning that Wasmtime has forever needed a
  C compiler configured and set up to build. This notably means that
  `cargo check` cannot check other targets easily.

* Using `longjmp` means that Rust function frames are unwound on the
  stack without running destructors. This is a dangerous operation of
  which we get no protection from the compiler about. Both frames
  entering wasm and frames exiting wasm are all skipped. Absolutely
  minimizing this has been beneficial for portability to platforms such
  as Pulley.

* Currently the no_std implementation of Wasmtime requires embedders to
  provide `wasmtime_{setjmp,longjmp}` which is a thorn in the side of
  what is otherwise a mostly entirely independent implementation of
  Wasmtime.

* There is a performance floor to using `setjmp` and `longjmp`. Calling
  `setjmp` requires using C but Wasmtime is otherwise written in Rust
  meaning that there's a Rust->C->Rust->Wasm boundary which
  fundamentally can't be inlined without cross-language LTO which is
  difficult to configure.

* With the implementation of the WebAssembly exceptions proposal
  Wasmtime now has two means of unwinding the stack. Ideally Wasmtime
  would only have one, and the more general one is the method of
  exceptions.

* Jumping out of a signal handler on Unix is tricky business. While
  we've made it work it's generally most robust of the signal handler
  simply returns which it now does.

With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.

One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in `VMStoreContext` so looking up
the current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a `Handler` are stored inline.

Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as bytecodealliance#11570, bytecodealliance#11585,
and bytecodealliance#11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.

Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
`ucontext_t` with register values that the handler specifies. Windows
involves updating similar contexts, and macOS mach ports ended up not
needing too many changes.

In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:

    sync/no-hook/core - host-to-wasm - typed - nop
			    time:   [10.552 ns 10.561 ns 10.571 ns]
			    change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05)
			    Performance has improved.

Closes bytecodealliance#3927
cc bytecodealliance#10923

prtest:full
github-merge-queue bot pushed a commit that referenced this pull request Sep 8, 2025
Since Wasmtime's inception it's used the `setjmp` and `longjmp`
functions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:

* Using `setjmp` fundamentally requires using C because Rust does not
  understand a function that returns twice. It's fundamentally unsound
  to invoke `setjmp` in Rust meaning that Wasmtime has forever needed a
  C compiler configured and set up to build. This notably means that
  `cargo check` cannot check other targets easily.

* Using `longjmp` means that Rust function frames are unwound on the
  stack without running destructors. This is a dangerous operation of
  which we get no protection from the compiler about. Both frames
  entering wasm and frames exiting wasm are all skipped. Absolutely
  minimizing this has been beneficial for portability to platforms such
  as Pulley.

* Currently the no_std implementation of Wasmtime requires embedders to
  provide `wasmtime_{setjmp,longjmp}` which is a thorn in the side of
  what is otherwise a mostly entirely independent implementation of
  Wasmtime.

* There is a performance floor to using `setjmp` and `longjmp`. Calling
  `setjmp` requires using C but Wasmtime is otherwise written in Rust
  meaning that there's a Rust->C->Rust->Wasm boundary which
  fundamentally can't be inlined without cross-language LTO which is
  difficult to configure.

* With the implementation of the WebAssembly exceptions proposal
  Wasmtime now has two means of unwinding the stack. Ideally Wasmtime
  would only have one, and the more general one is the method of
  exceptions.

* Jumping out of a signal handler on Unix is tricky business. While
  we've made it work it's generally most robust of the signal handler
  simply returns which it now does.

With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.

One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in `VMStoreContext` so looking up
the current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a `Handler` are stored inline.

Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as #11570, #11585,
and #11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.

Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
`ucontext_t` with register values that the handler specifies. Windows
involves updating similar contexts, and macOS mach ports ended up not
needing too many changes.

In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:

    sync/no-hook/core - host-to-wasm - typed - nop
			    time:   [10.552 ns 10.561 ns 10.571 ns]
			    change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05)
			    Performance has improved.

Closes #3927
cc #10923

prtest:full
bongjunj pushed a commit to prosyslab/wasmtime that referenced this pull request Oct 20, 2025
)

* aarch64: Add support for "near" in `LoadExtName`

The current aarch64 backend does not support `symbol_value` to get the
value of a function, for example, with a "near" relocation using a
relative relocation. Currently it uses an `Abs8` relocation which means
that it's not suitable in Wasmtime, for example.

This commit refactors relocation/external name support in the aarch64
backend to support this mode of relocation. The previous `LoadExtName`
was split into `LoadExtName{Got,Near,Far}` where the "near" bit is
what's new to the backend. The preexisting `symbol-value.clif`-style
tests were updated to match the x64 backend which has a more
comprehensive suite of examples of what it looks like to refer to
various symbols.

The goal of this commit is to enable Wasmtime to generate code which
refers to a relative point elsewhere in the code (e.g. an exception
handler) and load the value into a register. This part isn't filled out
yet, but it seemed good to at least in the meantime fill out these
missing relocations in the backend.

* Fix clippy warning

* Add support for new relocations to cranelift-jit

Needed for filetests
bongjunj pushed a commit to prosyslab/wasmtime that referenced this pull request Oct 20, 2025
* riscv64: Implement near relocations

This is the same as bytecodealliance#11570 but for the riscv64 backend. The intention is
to support "near" relocations which don't require `Abs8` relocations for
upcoming use in Wasmtime. The same design as bytecodealliance#11570 is used here.

* Review comments
bongjunj pushed a commit to prosyslab/wasmtime that referenced this pull request Oct 20, 2025
Since Wasmtime's inception it's used the `setjmp` and `longjmp`
functions in C to implement handling of traps. While this solution was
easy to implement, relatively portable, and performant enough, there are
a number of downsides that have evolved over time to make this an
unattractive approach in the long run:

* Using `setjmp` fundamentally requires using C because Rust does not
  understand a function that returns twice. It's fundamentally unsound
  to invoke `setjmp` in Rust meaning that Wasmtime has forever needed a
  C compiler configured and set up to build. This notably means that
  `cargo check` cannot check other targets easily.

* Using `longjmp` means that Rust function frames are unwound on the
  stack without running destructors. This is a dangerous operation of
  which we get no protection from the compiler about. Both frames
  entering wasm and frames exiting wasm are all skipped. Absolutely
  minimizing this has been beneficial for portability to platforms such
  as Pulley.

* Currently the no_std implementation of Wasmtime requires embedders to
  provide `wasmtime_{setjmp,longjmp}` which is a thorn in the side of
  what is otherwise a mostly entirely independent implementation of
  Wasmtime.

* There is a performance floor to using `setjmp` and `longjmp`. Calling
  `setjmp` requires using C but Wasmtime is otherwise written in Rust
  meaning that there's a Rust->C->Rust->Wasm boundary which
  fundamentally can't be inlined without cross-language LTO which is
  difficult to configure.

* With the implementation of the WebAssembly exceptions proposal
  Wasmtime now has two means of unwinding the stack. Ideally Wasmtime
  would only have one, and the more general one is the method of
  exceptions.

* Jumping out of a signal handler on Unix is tricky business. While
  we've made it work it's generally most robust of the signal handler
  simply returns which it now does.

With all of that in mind the purpose of this commit is to replace the
setjmp/longjmp mechanism of handling traps with the recently implemented
support for exceptions in Cranelift. That is intended to resolve all of
the above points in one swoop.

One point in particular though that's nice about setjmp/longjmp is that
unwinding the stack on a trap is an O(1) operation. For situations such
as stack overflow that's a particularly nice property to have as we can
guarantee embedders that traps are a constant time (albeit somewhat
expensive with signals) operation. Exceptions naively require unwinding
the entire stack, and although frame pointers mean we're just traversing
a linked list I wanted to preserve the O(1) property here nonetheless.
To achieve this a solution is implemented where the array-to-wasm
(host-to-wasm) trampolines setup state in `VMStoreContext` so looking up
the current trap handler frame is an O(1) operation. Namely the sp/fp/pc
values for a `Handler` are stored inline.

Implementing this feature required supporting
relocations-to-offsets-in-functions which was not previously supported
by Wasmtime. This required Cranelift refactorings such as bytecodealliance#11570, bytecodealliance#11585,
and bytecodealliance#11576. This then additionally required some more refactoring in
this commit which was difficult to split out as it otherwise wouldn't be
tested.

Apart from the relocation-related business much of this change is about
updating the platform signal handlers to use exceptions instead of
longjmp to return. For example on Unix this means updating the
`ucontext_t` with register values that the handler specifies. Windows
involves updating similar contexts, and macOS mach ports ended up not
needing too many changes.

In terms of overall performance the relevant benchmark from this
repository, compared to before this commit, is:

    sync/no-hook/core - host-to-wasm - typed - nop
			    time:   [10.552 ns 10.561 ns 10.571 ns]
			    change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05)
			    Performance has improved.

Closes bytecodealliance#3927
cc bytecodealliance#10923

prtest:full
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants