Skip to content

Implement register-based closure ctx#1568

Closed
cpunion wants to merge 5 commits intogoplus:mainfrom
cpunion:closure-ctxreg
Closed

Implement register-based closure ctx#1568
cpunion wants to merge 5 commits intogoplus:mainfrom
cpunion:closure-ctxreg

Conversation

@cpunion
Copy link
Collaborator

@cpunion cpunion commented Jan 16, 2026

Summary

This PR updates LLGo’s closure ABI and calling convention: use a reserved register for ctx when available; otherwise pass ctx as an implicit first parameter with a conditional call; represent closures as pointers to funcval with inline env.

ABI / Representation

  • Closures are represented as type funcval struct { fn *func; hasCtx uintptr; env ... } and type closure = *funcval; the ABI only sees the pointer.
  • hasCtx keeps the header size fixed and supports conditional calls on no‑reg targets.
  • env is inline after the header; plain functions have no env (object size = 2 pointers).
  • C function pointers are first‑class: a funcval can point directly at a C symbol, so the call site can use the real address without wrapper stubs.

Calling Convention

  • With a ctx register: write_ctx(env_ptr) then call fn(args...); env_ptr = closure_ptr + 2*ptrSize.
  • Without a ctx register: conditionally call fn(ctx, args...) vs fn(args...) based on hasCtx.

getClosurePtr

  • getClosurePtr returns &env[0] (pointer to the first env slot).
  • On ctx‑register targets it reads the ctx register to get the env base.
  • On no‑reg targets it uses the explicit ctx parameter as the env base.

Context Register Mapping

GOARCH Register Notes
amd64 mm0 use -msse2 to free MMX
386 mm0 use -msse2 to free MMX
arm64 x26 reserved via clang target-feature
riscv64 x27 reserved via clang target-feature
wasm - conditional ctx param
arm - conditional ctx param

Native builds reserve the ctx reg via clang target-feature +reserve-<reg> (arm64/riscv64).
For caller-saved x86, inline asm uses a memory clobber; callee-saved targets do not.

Example IR (closure + C func)

Example Go code:

func cfunc(i int64)

func main() {
  var fn func(i int64)
  fn = cfunc
  fn(0)

  var i int64 = 0
  fn = func(v int64) { i = v }
  fn(0)
}

With ctx register (arm64/riscv64/x86*)

Caller (main) writes ctx register and calls the real function symbol:

; fn = cfunc
store ptr @__llgo_closure_const$cfunc, ptr %fn_slot
; fn(0)
%fv = load ptr, ptr %fn_slot
%fnptr = load ptr, ptr %fv
%env_base = getelementptr i8, ptr %fv, i64 16  ; &env[0]
call void @llvm.write_register.i64(metadata !"x26", i64 %env_base)
call void %fnptr(i64 0)

; fn = closure
%fv2 = call ptr @AllocU(i64 24)          ; {fn,hasCtx,env0}
store ptr @main$1, ptr %fv2
store i64 1, ptr (getelementptr i8, ptr %fv2, i64 8)
store ptr %i, ptr (getelementptr i8, ptr %fv2, i64 16)
store ptr %fv2, ptr %fn_slot
; fn(0) ... same pattern ...

Closure body (main$1) reads ctx register at entry:

define void @main$1(i64 %v) {
entry:
  %env_base = call i64 @llvm.read_register.i64(metadata !"x26")
  %env0p = inttoptr i64 %env_base to ptr
  %i_ptr = load ptr, ptr %env0p
  store i64 %v, ptr %i_ptr
  ret void
}

C function remains a normal symbol:

declare void @cfunc(i64)

Without ctx register (wasm/arm)

Caller (main) branches on hasCtx to pick the correct signature:

%has = load i64, ptr (getelementptr i8, ptr %fv, i64 8)
%has_i1 = icmp ne i64 %has, 0
br i1 %has_i1, label %with, label %plain

with:
  %fnptr = load ptr, ptr %fv
  %env_base = getelementptr i8, ptr %fv, i64 16
  %fp1 = bitcast ptr %fnptr to ptr (ptr, i64)*
  call void %fp1(ptr %env_base, i64 0)
  br label %done

plain:
  %fnptr2 = load ptr, ptr %fv
  %fp2 = bitcast ptr %fnptr2 to ptr (i64)*
  call void %fp2(i64 0)
  br label %done

Closure body (main$1) takes an explicit ctx parameter:

define void @main$1(ptr %env_base, i64 %v) {
entry:
  %i_ptr = load ptr, ptr %env_base
  store i64 %v, ptr %i_ptr
  ret void
}

__llgo_closure_const$...

These are constant closure objects in read‑only data. They:

  1. carry env without heap allocation, and
  2. provide stable, deduplicated closure identities for type metadata (map hash/eq helpers, etc.).

Discussion: Alternative Layout (difference only)

Alternative split layout:

type closure struct {
  fn ptr
  data *struct { hasCtx bool; env ... }
}

Differences vs *funcval (no value judgment):

  • The closure value is always two words, and the data object contains hasCtx + env.
  • Call sites need one extra indirection (data) to reach env.
  • Constant closures can be represented as two constants (closure + data), instead of a single funcval constant.
  • The data object may be heap allocated or embedded in other objects depending on escape/placement.

Covered Scenarios

Plain funcs, captured closures, method values/expressions, interface method values, varargs, go/defer, C callbacks.

@gemini-code-assist
Copy link

Note

The number of changes in this pull request is too large for Gemini Code Assist to generate a summary.

@xgopilot
Copy link
Contributor

xgopilot bot commented Jan 16, 2026

Code Review Summary

This PR implements a well-designed register-based closure context mechanism. The architecture is sound with appropriate register selection (callee-saved registers), proper save/restore semantics, and comprehensive test coverage.

Strengths:

  • Comprehensive test coverage across all closure patterns
  • Register pollution tests for both C interop and nested closures
  • Proper thread safety through per-goroutine register isolation
  • Efficient wrapper generation with tail call optimization

Minor Issues to Address:

  • Unused noop() function in regpollute/in.go
  • Missing direct method call tests in go/in.go vs defer/in.go
  • Documentation could clarify wrapper naming variations

Overall, this is a solid implementation ready for merge after addressing minor issues.

@cpunion cpunion force-pushed the closure-ctxreg branch 3 times, most recently from 26e9b99 to b866d12 Compare January 16, 2026 00:23
@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.01%. Comparing base (5899edf) to head (cfdce00).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1568      +/-   ##
==========================================
- Coverage   91.01%   90.01%   -1.01%     
==========================================
  Files          45       47       +2     
  Lines       11971    12328     +357     
==========================================
+ Hits        10896    11097     +201     
- Misses        899     1036     +137     
- Partials      176      195      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cpunion cpunion force-pushed the closure-ctxreg branch 3 times, most recently from 314b808 to 9c9df88 Compare January 17, 2026 11:06
@cpunion cpunion force-pushed the closure-ctxreg branch 3 times, most recently from eccba46 to 1107d49 Compare January 22, 2026 01:12
if info.Name == "" {
return nil
}
return []string{"-mllvm", "--reserve-regs-for-regalloc=" + info.Name}
Copy link
Contributor

@zhouguangyuan0718 zhouguangyuan0718 Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option is for aarch64 only, it will not work for other archs. reference to here. The option to reserve register in llvm is not same for all archs. For aarch64 and riscv64, should use command option "-mattr=+reserve-x26" ,reference to here and here. Or set the "target-feature" attr to the function, like:

define void @reserve_x26() "target-features"="+neon,+reserve-x26"

Unfortunately, as I know, similar feature is not supported for x86......

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the note — I rechecked this across targets with both clang and llc.

  • -mllvm --reserve-regs-for-regalloc= works across all tested platforms. Ubuntu amd64 CI already passes with this.
  • -mattr=+reserve- is llc-only. clang does not accept -mattr (it always reports “unknown argument”).
  • -ffixed-* and +reserve-* are not portable: some targets reject them outright, others ignore the feature.

So the most portable path is still -mllvm --reserve-regs-for-regalloc=.... -mattr only makes sense when driving llc
directly, and -ffixed/+reserve are target‑specific.

Evidence (x86_64/i386):

x86_64: -ffixed (clang)

clang -target x86_64-unknown-linux-gnu -x ir -c x86_readwrite.ll -o /tmp/a.o -ffixed-r12

clang: error: unknown argument '-ffixed-r12'; did you mean '-ffixed-r19'?

x86_64: +reserve (clang)

clang -target x86_64-unknown-linux-gnu -x ir -c x86_readwrite.ll -o /tmp/a.o
-Xclang -target-feature -Xclang +reserve-r12

'+reserve-r12' is not a recognized feature for this target (ignoring feature)
fatal error: error in backend: Invalid register name global variable

i386: -ffixed (clang)

clang -target i386-unknown-linux-gnu -x ir -c i386_readwrite.ll -o /tmp/a.o -ffixed-esi

clang: error: unknown argument: '-ffixed-esi'

i386: +reserve (clang)

clang -target i386-unknown-linux-gnu -x ir -c i386_readwrite.ll -o /tmp/a.o
-Xclang -target-feature -Xclang +reserve-esi

'+reserve-esi' is not a recognized feature for this target (ignoring feature)

x86_64: --reserve-regs-for-regalloc (clang, simple IR)

clang -target x86_64-unknown-linux-gnu -x ir -c simple.ll -o /tmp/a.o
-mllvm --reserve-regs-for-regalloc=r12

(no error)

Extra note on -mattr:
-mattr=+reserve-* is accepted by llc (e.g. AArch64/RISC‑V), but clang does not accept -mattr at all:

clang: error: unknown argument: '-mattr=+reserve-x26'

}{
"amd64": {writeFmt: "mov \\$0, %%%s", readFmt: "mov %%%s, \\$0"},
"386": {writeFmt: "mov \\$0, %%%s", readFmt: "mov %%%s, \\$0"},
"arm64": {writeFmt: "mov %s, \\$0", readFmt: "mov \\$0, %s"},
Copy link
Contributor

@zhouguangyuan0718 zhouguangyuan0718 Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference to the previous comment. If ignore the support issue for amd64 and use "+reserve-x26" to reserve the register. Then use an llvm intrinsic to access the register is better. Reference to llvm.read_register and llvm.write_register. It will avoid the redundant "move instruction" and the sideeffect.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. llvm.read_register / llvm.write_register support
  • x86_64: not supported (fails with Invalid register name global variable)
  • i386: works (tested with esi)
  • AArch64: works (tested with x26)
  • RISC‑V 32/64: works (tested with x27)
  • Other targets (ARMv7/MIPS/PPC/S390x/WASM/AVR/Xtensa, etc.) are untested; LangRef warns allocatable GPR support is limited.
  1. IR becomes longer + needs memory barrier
  • Inline asm has ~{memory} baked in, which is a compiler‑level barrier.
  • If we switch to intrinsics, we must add explicit memory clobber to keep the same ordering.
  • Minimal equivalent is 2 IR instructions per read/write:
  call void asm sideeffect "", "~{memory}"()
  %v = call i64 @llvm.read_volatile_register.i64(metadata !0)

  call void asm sideeffect "", "~{memory}"()
  call void @llvm.write_register.i64(metadata !0, i64 %val)
  (So read+write becomes 4 IR instructions instead of 2.)

Because x86_64 does not support the intrinsics, and the intrinsic path adds extra IR (requires memory barriers), it is safer
and simpler to keep the current inline asm approach rather than switching.

Copy link
Contributor

@zhouguangyuan0718 zhouguangyuan0718 Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Support on x86 should be a separate issue, now only these register are supported. For AArch64/Riscv, if using target-feature to reserve the register, llvm.read_register / llvm.write_register should work.
  2. Why we need the memory barrier? IMO, the final assembly instructions order may be scheduled different with the IR order, but should not broke the semantics. Or are there some other reason?
    And the instruction count in IR is not same with the assembly instructions. If using inline asm, the asm code in the inlineASM expression will be kept in the final asm. But if using intrinsic, it will not generate extra instruction in asm code, just using the register directly.
    Which mean:

InlineASM:

  %1 = call ptr asm sideeffect "mov $0, x26", "=r,~{memory}"()
  %2 = load { ptr }, ptr %1, align 8

==>

mov x0, x26 // x0 is selected by llvm, maybe other, but this mov instruction will not be eliminated
ldr x1, [x0]

intrinsic:

  %1 = call i64 @llvm.read_register.i64(metadata !7)
  %2 = load { ptr }, ptr %1, align 8

  !7 = !{!"x26"}

==>

ldr x1, [x26]   // x1 is selected by llvm, maybe other, but there is no mov instruction, the reserved register can be accessed directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. For caller-saved x86, inline asm uses a memory clobber; callee-saved targets do not.

dialect: llvm.InlineAsmDialectATT,
},
"arm64": {
write: "mov %s, $0",
Copy link
Contributor

@zhouguangyuan0718 zhouguangyuan0718 Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with #1568 (comment)

@cpunion cpunion marked this pull request as draft February 2, 2026 16:14
@cpunion cpunion closed this Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants