Fix a couple of issues that prevent wasmtime for compiling/running on arm64_32 (Apple Watch)#13259
Conversation
|
Wanted to say again thanks for the porting work here and even the benchmark work as well, it's much appreciated! |
7ab1b5f to
48096e4
Compare
`crates/unwinder/src/arch/aarch64.rs` has inline-asm operands that take
register-width values. They were typed `usize`, which works on the usual
`aarch64-*` LP64 targets where `usize` is `u64` and the operand class is
unambiguously the 64-bit GPR view. On `arm64_32-apple-watchos` (ILP32
ABI: 64-bit registers, 32-bit pointers) `usize` is `u32`, which makes
the same operands ambiguous between the `w<N>` (32-bit lane) and `x<N>`
(64-bit GPR) views — exactly what rustc's `asm_sub_register` lint flags.
Relying on the ISA-side zero-extend that aarch64 happens to perform on
`mov w<N>, ...` would also be relying on a property the language
doesn't promise: the Rust Reference is explicit that the upper bits of
a register holding a sub-register-width input are *undefined*[0].
Rather than leak `u64` into the public surface (the `Unwind` trait, the
shared `arch/mod.rs` dispatch, and the per-arch backends in `x86.rs`,
`riscv64.rs`, `s390x.rs`), keep the public function signatures `usize`
— that's the existing convention shared with the other backends, and
the `u64`-vs-pointer-width split is unique to aarch64-on-ILP32. Inside
this module, type any register-bearing local that participates in
inline asm as `u64`, and cast at the boundaries:
- `u64::try_from(v).unwrap()` widens `usize` → `u64` (infallible on
every supported Rust target, the `.unwrap()` documents that any
failure would be a target-property issue).
- `as usize` narrows `u64` → `usize` at the return — truncates on
`arm64_32` by design (the saved PC/SP there is a 32-bit host
pointer that fits exactly in the low 32 bits) and is the identity
on aarch64 LP64.
Also switch the saved-LR load from `*(fp as *mut usize).offset(1)` to
`*(fp as *mut u64).offset(1)`. AAPCS64 reserves two 64-bit slots for
the frame record on every aarch64 ABI variant — including `arm64_32` —
so an 8-byte stride is correct regardless of pointer width. With
`*mut usize` on `arm64_32` `.offset(1)` would advance by only 4 bytes
and read the upper half of the saved-FP slot. This is a latent
correctness fix; today the unwinder isn't exercised on `arm64_32`
(which runs Pulley, not Cranelift-compiled native code), but the
corrected form is the right one to land alongside the type change.
No behaviour change on existing aarch64 LP64 targets. Silences two
`asm_sub_register` warnings on a future `arm64_32-apple-watchos` build
of this crate.
[0]: https://doc.rust-lang.org/reference/inline-assembly.html#r-asm.register-operands.smaller-value
mach2 v0.4.2 emits `compile_error!("mach requires macOS or iOS")` on any
target where neither `target_os = "macos"` nor `target_os = "ios"` matches.
That blocks every Apple non-iOS-non-macOS platform — most pressingly
arm64_32-apple-watchos for embedders shipping wasmtime on Apple Watch.
The fix has been upstream in mach2 since 0.6.0 (commit `538ce75`,
2025-08-16, "Add support for tvOS, watchOS and visionOS"), which widens
the cfg gate from `cfg(any(macos, ios))` to `cfg(target_vendor = "apple")`
on both the `compile_error!` and the `libc` build-dep, with no public-API
changes in the modules wasmtime imports
(`exc`, `exception_types`, `kern_return`, `mach_init`, `mach_port`,
`message`, `ndr`, `port`, `thread_act`, `thread_status`).
Verified by building wasmtime as a `staticlib` for `arm64_32-apple-watchos`
under `nightly-2026-01-25 + -Z build-std=std,panic_abort` with
`--features pulley,runtime,std,cranelift,anyhow` — no other changes
needed in `crates/wasmtime/src/runtime/vm/sys/unix/machports.rs`.
The dev-only path (`cranelift-jit -> region -> mach2 0.4.x`) keeps an
older mach2 in the lockfile for cranelift-jit's own host tests; that
path is not part of any production embedder build and stays unchanged.
Closes the watchOS port story without needing a separate mach2 release.
48096e4 to
f6d629b
Compare
|
For the vets I typically push directly to a PR, which by-default works most of the time, but I think the origin of this fork, the rebeckerspecialties organization, doesn't allow that. In lieu of that @matthargett could you cherry-pick alexcrichton@4c193dd into this PR and then I can an approve-and-merge? |
|
Done — cherry-picked your Verified locally:
Ready for approve-and-merge whenever you have a moment. Thanks for the offer to push directly — the rebeckerspecialties org's branch protections do block third-party pushes, so the cherry-pick path is the cleanest workaround. |
Two-commit series enabling
wasmtimeto build forarm64_32-apple-watchos(Apple Watch Series 4+ ILP32 ABI). Verified end-to-end on Apple Watch SE 2
(S8 SoC, watchOS 11) and iPhone XS (A12, iOS 18) running an 11-workload
Pulley benchmark, with WAMR fast-interp as a side-by-side comparison
runtime.
Commit 1 —
unwinder: type aarch64 register-bearing locals as u64crates/unwinder/src/arch/aarch64.rshas inline-asm operands that takeregister-width values. They were typed
usize, which works on the usualaarch64-*LP64 targets whereusizeisu64and the operand class isunambiguously the 64-bit GPR view. On
arm64_32-apple-watchos(ILP32ABI: 64-bit registers, 32-bit pointers)
usizeisu32, which makesthe same operands ambiguous between the
w<N>(32-bit lane) andx<N>(64-bit GPR) views — exactly what rustc's
asm_sub_registerlint flags.Relying on the ISA-side zero-extend that aarch64 happens to perform on
mov w<N>, ...would also be relying on a property the language doesn'tpromise: the Rust Reference is explicit that the upper bits of a
register holding a sub-register-width input are undefined (see
https://doc.rust-lang.org/reference/inline-assembly.html#r-asm.register-operands.smaller-value).
Rather than leak
u64into the public surface (theUnwindtrait, theshared
arch/mod.rsdispatch, and the per-arch backends inx86.rs/riscv64.rs/s390x.rs), keep the public function signaturesusize— that's the existing convention shared with the other backends, and
the
u64-vs-pointer-width split is unique to aarch64-on-ILP32. Insideaarch64.rsonly, type any register-bearing local that participates ininline asm as
u64, and cast at the boundaries:u64::try_from(v).unwrap()widensusize→u64(infallible onevery supported Rust target, the
.unwrap()documents that anyfailure would be a target-property issue rather than a runtime one).
as usizenarrowsu64→usizeat the return — truncates onarm64_32by design (the saved PC/SP there is a 32-bit hostpointer that fits exactly in the low 32 bits) and is the identity
on aarch64 LP64.
Also switch the saved-LR load from
*(fp as *mut usize).offset(1)to*(fp as *mut u64).offset(1). AAPCS64 reserves two 64-bit slots forthe frame record on every aarch64 ABI variant — including
arm64_32—so an 8-byte stride is correct regardless of pointer width. With
*mut usizeonarm64_32.offset(1)would advance by only 4 bytesand read the upper half of the saved-FP slot. This is a latent
correctness fix; today the unwinder isn't exercised on
arm64_32(which runs Pulley, not Cranelift-compiled native code), but the
corrected form is the right one to land alongside the type change.
Diff is one file (
crates/unwinder/src/arch/aarch64.rs, +69 / -4).No behaviour change on existing aarch64 LP64 targets; silences two
asm_sub_registerwarnings on a futurearm64_32-apple-watchosbuild.Commit 2 —
Bump mach2 dep from 0.4.2 to 0.6mach2 v0.4.2emitscompile_error!("mach requires macOS or iOS")onany target where neither
target_os = "macos"noriosmatches, plusa matching narrow
target_vendorgate on itslibcbuild-dep. Thatblocks Apple watchOS / tvOS / visionOS targets — wasmtime's
runtimefeature pulls
mach2in unconditionally so the build fails with botherror: mach requires macOS or iOSanderror[E0463]: can't find crate for libc.The fix has been upstream in mach2 since 0.6.0 (commit
538ce75,2025-08-16, "Add support for tvOS, watchOS and visionOS"): both gates
widen to
cfg(target_vendor = "apple"). The mach2 module API wasmtimeimports (
exc,exception_types,kern_return,mach_init,mach_port,message,ndr,port,thread_act,thread_status)is unchanged between 0.4.2 and 0.6.0; only internal libc/core::ffi
type-plumbing differs. Bumping the workspace dep is sufficient — no
changes in
machports.rs.Verified by building
wasmtimeas astaticlibforarm64_32-apple-watchosundernightly-2026-01-25 + -Z build-std=std,panic_abortwith--features pulley,runtime,std,cranelift,anyhow. The dev-only path(
cranelift-jit -> region -> mach2 0.4.x) keeps an older mach2 in thelockfile for cranelift-jit's own host tests; that path is not part of
any production embedder build and stays unchanged.
cargo denyflagsthe resulting two
mach2versions butregionis already inskip-tree, so nodeny.tomlchange is needed; the right long-termfix is for
regionto update. @alexcrichton is preparing acargo vetaudit update for the new mach2 0.6.0 separately.End-to-end verification
This 2-commit stack + the companion
target-lexiconArm64_32patch(submitted separately to bytecodealliance/target-lexicon) is enough to
build a Pulley-only static library for arm64_32-apple-watchos and link
it into a watchOS app. On real hardware:
Apple Watch SE 2 (S8 SoC, watchOS 11, arm64_32-apple-watchos)
iPhone XS (A12, iOS 18, aarch64-apple-ios)
All results match the host-Rust reference function byte-for-byte
across both runtimes.