Conversation
|
Note The number of changes in this pull request is too large for Gemini Code Assist to generate a summary. |
387a4f1 to
b39916b
Compare
Code Review SummaryThis PR implements a well-designed register-based closure context mechanism. The architecture is sound with appropriate register selection (callee-saved registers), proper save/restore semantics, and comprehensive test coverage. Strengths:
Minor Issues to Address:
Overall, this is a solid implementation ready for merge after addressing minor issues. |
26e9b99 to
b866d12
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1568 +/- ##
==========================================
- Coverage 91.01% 90.01% -1.01%
==========================================
Files 45 47 +2
Lines 11971 12328 +357
==========================================
+ Hits 10896 11097 +201
- Misses 899 1036 +137
- Partials 176 195 +19 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
314b808 to
9c9df88
Compare
eccba46 to
1107d49
Compare
internal/ctxreg/ctxreg.go
Outdated
| if info.Name == "" { | ||
| return nil | ||
| } | ||
| return []string{"-mllvm", "--reserve-regs-for-regalloc=" + info.Name} |
There was a problem hiding this comment.
This option is for aarch64 only, it will not work for other archs. reference to here. The option to reserve register in llvm is not same for all archs. For aarch64 and riscv64, should use command option "-mattr=+reserve-x26" ,reference to here and here. Or set the "target-feature" attr to the function, like:
define void @reserve_x26() "target-features"="+neon,+reserve-x26"
Unfortunately, as I know, similar feature is not supported for x86......
There was a problem hiding this comment.
Thanks for the note — I rechecked this across targets with both clang and llc.
- -mllvm --reserve-regs-for-regalloc= works across all tested platforms. Ubuntu amd64 CI already passes with this.
- -mattr=+reserve- is llc-only. clang does not accept -mattr (it always reports “unknown argument”).
- -ffixed-* and +reserve-* are not portable: some targets reject them outright, others ignore the feature.
So the most portable path is still -mllvm --reserve-regs-for-regalloc=.... -mattr only makes sense when driving llc
directly, and -ffixed/+reserve are target‑specific.
Evidence (x86_64/i386):
x86_64: -ffixed (clang)
clang -target x86_64-unknown-linux-gnu -x ir -c x86_readwrite.ll -o /tmp/a.o -ffixed-r12
clang: error: unknown argument '-ffixed-r12'; did you mean '-ffixed-r19'?
x86_64: +reserve (clang)
clang -target x86_64-unknown-linux-gnu -x ir -c x86_readwrite.ll -o /tmp/a.o
-Xclang -target-feature -Xclang +reserve-r12
'+reserve-r12' is not a recognized feature for this target (ignoring feature)
fatal error: error in backend: Invalid register name global variable
i386: -ffixed (clang)
clang -target i386-unknown-linux-gnu -x ir -c i386_readwrite.ll -o /tmp/a.o -ffixed-esi
clang: error: unknown argument: '-ffixed-esi'
i386: +reserve (clang)
clang -target i386-unknown-linux-gnu -x ir -c i386_readwrite.ll -o /tmp/a.o
-Xclang -target-feature -Xclang +reserve-esi
'+reserve-esi' is not a recognized feature for this target (ignoring feature)
x86_64: --reserve-regs-for-regalloc (clang, simple IR)
clang -target x86_64-unknown-linux-gnu -x ir -c simple.ll -o /tmp/a.o
-mllvm --reserve-regs-for-regalloc=r12
(no error)
Extra note on -mattr:
-mattr=+reserve-* is accepted by llc (e.g. AArch64/RISC‑V), but clang does not accept -mattr at all:
clang: error: unknown argument: '-mattr=+reserve-x26'
| }{ | ||
| "amd64": {writeFmt: "mov \\$0, %%%s", readFmt: "mov %%%s, \\$0"}, | ||
| "386": {writeFmt: "mov \\$0, %%%s", readFmt: "mov %%%s, \\$0"}, | ||
| "arm64": {writeFmt: "mov %s, \\$0", readFmt: "mov \\$0, %s"}, |
There was a problem hiding this comment.
Reference to the previous comment. If ignore the support issue for amd64 and use "+reserve-x26" to reserve the register. Then use an llvm intrinsic to access the register is better. Reference to llvm.read_register and llvm.write_register. It will avoid the redundant "move instruction" and the sideeffect.
There was a problem hiding this comment.
- llvm.read_register / llvm.write_register support
- x86_64: not supported (fails with Invalid register name global variable)
- i386: works (tested with esi)
- AArch64: works (tested with x26)
- RISC‑V 32/64: works (tested with x27)
- Other targets (ARMv7/MIPS/PPC/S390x/WASM/AVR/Xtensa, etc.) are untested; LangRef warns allocatable GPR support is limited.
- IR becomes longer + needs memory barrier
- Inline asm has ~{memory} baked in, which is a compiler‑level barrier.
- If we switch to intrinsics, we must add explicit memory clobber to keep the same ordering.
- Minimal equivalent is 2 IR instructions per read/write:
call void asm sideeffect "", "~{memory}"()
%v = call i64 @llvm.read_volatile_register.i64(metadata !0)
call void asm sideeffect "", "~{memory}"()
call void @llvm.write_register.i64(metadata !0, i64 %val)
(So read+write becomes 4 IR instructions instead of 2.)Because x86_64 does not support the intrinsics, and the intrinsic path adds extra IR (requires memory barriers), it is safer
and simpler to keep the current inline asm approach rather than switching.
There was a problem hiding this comment.
- Support on x86 should be a separate issue, now only these register are supported. For AArch64/Riscv, if using target-feature to reserve the register,
llvm.read_register / llvm.write_registershould work. - Why we need the memory barrier? IMO, the final assembly instructions order may be scheduled different with the IR order, but should not broke the semantics. Or are there some other reason?
And the instruction count in IR is not same with the assembly instructions. If using inline asm, the asm code in the inlineASM expression will be kept in the final asm. But if using intrinsic, it will not generate extra instruction in asm code, just using the register directly.
Which mean:
InlineASM:
%1 = call ptr asm sideeffect "mov $0, x26", "=r,~{memory}"()
%2 = load { ptr }, ptr %1, align 8
==>
mov x0, x26 // x0 is selected by llvm, maybe other, but this mov instruction will not be eliminated
ldr x1, [x0]
intrinsic:
%1 = call i64 @llvm.read_register.i64(metadata !7)
%2 = load { ptr }, ptr %1, align 8
!7 = !{!"x26"}
==>
ldr x1, [x26] // x1 is selected by llvm, maybe other, but there is no mov instruction, the reserved register can be accessed directly.
There was a problem hiding this comment.
Updated. For caller-saved x86, inline asm uses a memory clobber; callee-saved targets do not.
| dialect: llvm.InlineAsmDialectATT, | ||
| }, | ||
| "arm64": { | ||
| write: "mov %s, $0", |
c6d3e64 to
247db3e
Compare
Summary
This PR updates LLGo’s closure ABI and calling convention: use a reserved register for ctx when available; otherwise pass ctx as an implicit first parameter with a conditional call; represent closures as pointers to
funcvalwith inline env.ABI / Representation
type funcval struct { fn *func; hasCtx uintptr; env ... }andtype closure = *funcval; the ABI only sees the pointer.hasCtxkeeps the header size fixed and supports conditional calls on no‑reg targets.envis inline after the header; plain functions have no env (object size = 2 pointers).funcvalcan point directly at a C symbol, so the call site can use the real address without wrapper stubs.Calling Convention
write_ctx(env_ptr)then callfn(args...);env_ptr = closure_ptr + 2*ptrSize.fn(ctx, args...)vsfn(args...)based onhasCtx.getClosurePtr
getClosurePtrreturns&env[0](pointer to the first env slot).Context Register Mapping
-msse2to free MMX-msse2to free MMXNative builds reserve the ctx reg via clang target-feature
+reserve-<reg>(arm64/riscv64).For caller-saved x86, inline asm uses a memory clobber; callee-saved targets do not.
Example IR (closure + C func)
Example Go code:
With ctx register (arm64/riscv64/x86*)
Caller (
main) writes ctx register and calls the real function symbol:Closure body (
main$1) reads ctx register at entry:C function remains a normal symbol:
Without ctx register (wasm/arm)
Caller (
main) branches onhasCtxto pick the correct signature:Closure body (
main$1) takes an explicit ctx parameter:__llgo_closure_const$...These are constant closure objects in read‑only data. They:
Discussion: Alternative Layout (difference only)
Alternative split layout:
Differences vs
*funcval(no value judgment):dataobject containshasCtx+env.data) to reachenv.closure+data), instead of a singlefuncvalconstant.dataobject may be heap allocated or embedded in other objects depending on escape/placement.Covered Scenarios
Plain funcs, captured closures, method values/expressions, interface method values, varargs,
go/defer, C callbacks.