Conversation
|
Just some background on this submission. I created this backend for some work on the PortMaster project of running classic PC games on cheap arm linux handhelds. It was developed entirely with claude code, so if you have an aversion to LLM development, this is severely tainted. It's been tested on haxe unit.hl as well as running Dead Cells on my M2 Asahi Linux machine. It's not super fancy, the register allocation scheme is pretty basic. We do try to utilize callee and caller-saved registers when we can. Still, it runs well enough for my needs. |
tobil4sk
left a comment
There was a problem hiding this comment.
Hi, this looks quite exciting! :) Just a few general comments, which will hopefully help clean things up a bit. I haven't looked at the actual jit implementation, hopefully someone else might be able to.
|
Interesting but a lot of changes. One question: does your implementation passes Haxe compiler unit tests? That would be a requirement before any merge. |
Yes all but three related to stack trace formatting. |
4a2a412 to
15cd29d
Compare
tobil4sk
left a comment
There was a problem hiding this comment.
There is also this place in CI that should be updated for arm64 support:
hashlink/.github/workflows/build.yml
Line 329 in e83fc71
Also wonder if jit_common.h and jit_shared.c make more sense to be named the same thing?
|
Since call stacks were mentioned and we're currently investigating a platform-specific HL problem related to that, could you try running https://github.com/HaxeFoundation/hxcoro/blob/master/callstack-tests/build-hl.hxml and let me know how that goes? |
|
|
Nice! That means #892 really is a problem with the x86 jit. |
|
There are some conflicts that need to be resolved (probably mainly in the Makefile, I can help with those if needed). The haxe test suite is also now part of ci, so if this branch is updated and enables jit tests for arm64 then we can verify that the haxe test suite is passing on arm64 mac/linux. |
Enable HashLink VM on AArch64 (Apple Silicon, ARM Linux servers, etc.) by adding a new JIT backend alongside the existing x86/x64 one. - Rename jit.c to jit_x86.c, extract shared code into jit_common.h/jit_shared.c - Add jit_aarch64.c/jit_aarch64_emit.c for ARM64 instruction selection and encoding - Add jit_elf.c for GDB JIT debug interface - Architecture-aware JIT selection in Makefile and CMakeLists.txt - Add aarch64 support in hl.h, profile.c, hlmodule.h, module.c
- Exception type filtering: OTrap now looks ahead at catch handler opcodes to set tcheck for typed exception catches, matching x86 - hl_jit_free: properly clean up all allocator state and support can_reset for hot reload, fixing memory leaks - OAssert: use correct LDR+BLR+B+literal pool pattern instead of broken literal+BL sequence that was never patched - OSwitch: replace O(n) linear CMP/B.EQ scan with O(1) branch table using ADR+ADD+BR - Size encoding: large-offset paths in op_get_mem/op_set_mem now correctly handle 1-byte and 2-byte access sizes Inspired by review of HaxeFoundation#857. Co-Authored-By: Claude Opus 4.6 <[email protected]>
…hared - Remove --export-dynamic from Makefile LFLAGS (was accidentally reintroduced during merge; removed upstream in fec624c) - Restore BOM and no-trailing-newline in hl.vcxproj.filters for Visual Studio compatibility - Rename jit_shared.c to jit_common.c to match jit_common.h Co-Authored-By: Claude Opus 4.6 <[email protected]>
Rebased and pushed. I also enabled the CI test on arm so hopefully it'll pass |
|
Looks like there is some |
|
There's a real bug in the SHA code tests so I'm working on a fix |
X16 and X17 were included in RCPU_SCRATCH_REGS, making them allocatable for vreg storage. However: 1. X17 is used for opcode debug markers (MOV W17, #marker) emitted before every opcode — clobbering any vreg value in W17 2. X16 is used as RTMP throughout the JIT for multi-instruction sequences (large stack offsets, address calculations, etc.) — clobbering any vreg value in W16 Under low register pressure, the allocator would pick X0-X15 first and never use X16/X17. But with 30+ vregs (like SHA256's computation), the allocator would spill into X16/X17, causing silent data corruption. Fix: Removed X16 and X17 from RCPU_SCRATCH_REGS and reduced RCPU_SCRATCH_COUNT from 18 to 16. These registers are reserved for their intended scratch/temporary purposes.
|
Looks good! The last thing to check that I have is the hxcoro tests: https://github.com/HaxeFoundation/hxcoro/blob/master/tests/build-hl.hxml (Which we should probably also run as part of the CI here because Haxe itself doesn't do that for HL.) |
|
Enable HashLink VM on AArch64 (Apple Silicon, ARM Linux servers, etc.) by adding a new JIT backend alongside the existing x86/x64 one.