Skip to content

Implement the semi-space copying garbage collector#13107

Merged
fitzgen merged 14 commits intobytecodealliance:mainfrom
fitzgen:initial-impl-copying-collector
Apr 29, 2026
Merged

Implement the semi-space copying garbage collector#13107
fitzgen merged 14 commits intobytecodealliance:mainfrom
fitzgen:initial-impl-copying-collector

Conversation

@fitzgen
Copy link
Copy Markdown
Member

@fitzgen fitzgen commented Apr 15, 2026

This is a classic semi-space copying collector that uses bump allocation and Cheney-style worklists to avoid an explicit stack for grey (relocated but not yet scanned) objects outside the GC heap. Forwarding "pointers" (really VMGcRef indices) are stored inline in objects in the old semi-space, which again avoids explicit data structures outside of the GC heap. Furthermore, an intrusive linked-list of all externrefs is maintained to allow for efficiently sweeping their associated host data after collection (and, once again, avoiding additional data structures outside the GC heap).

Allocation in compiled Wasm code always happens by calling out to the gc_alloc_raw libcall currently. This is expected to become inline bump allocation, very similar to the null collector, in a follow-up commit shortly after this one. It is delayed to make review easier.

Collection is not incremental (in the Wasmtime sense, not the GC literature sense) yet and the CopyingCollection::collect_increment implementation only has a single increment. This is delayed to make review easier and is also expected to come shortly in follow-up commits.

I've also added new disas tests for the copying collector and also enabled running wast tests which need a GC heap with the copying collector.

Finally, fuzz config generation can now generate configurations that enable the copying collector.

Fixes #10329

@fitzgen fitzgen requested review from a team as code owners April 15, 2026 14:28
@fitzgen fitzgen requested review from alexcrichton and removed request for a team April 15, 2026 14:28
@github-actions github-actions Bot added fuzzing Issues related to our fuzzing infrastructure wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:ref-types Issues related to reference types and GC in Wasmtime labels Apr 15, 2026
@github-actions
Copy link
Copy Markdown

Subscribe to Label Action

cc @fitzgen

Details This issue or pull request has been labeled: "fuzzing", "wasmtime:api", "wasmtime:ref-types"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: fuzzing, wasmtime:ref-types

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Copy link
Copy Markdown
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this looks fantastic, thanks so much!

I've got some minor comments below, but I'm also happy to defer anything to issues as you see fit. My rough assumption as well is that long-term we'll want nurseries as well, right? If that's the case would it make sense to document that in an issue as an open future work item?

Comment thread tests/disas/gc/copying/v128-fields.wat
Comment thread crates/cranelift/src/func_environ/gc/enabled/copying.rs
Comment thread crates/cranelift/src/func_environ/gc/enabled/copying.rs
Comment thread crates/cranelift/src/func_environ/gc/enabled/copying.rs
Comment thread crates/cranelift/src/func_environ/gc/enabled/copying.rs
Comment thread crates/wasmtime/src/runtime/vm/gc/enabled/copying.rs Outdated
Comment thread crates/wasmtime/src/runtime/vm/gc/enabled/copying.rs Outdated
Comment thread crates/wasmtime/src/runtime/vm/gc/enabled/copying.rs Outdated
Comment thread crates/wasmtime/src/runtime/vm/gc/enabled/copying.rs Outdated
Comment thread crates/wasmtime/src/runtime/vm/gc/enabled/copying.rs Outdated
@alexcrichton
Copy link
Copy Markdown
Member

Oh, also, to kick the tire some more I might recommend locally making the copying collector the default (instead of drc) and running the full test suite. Iunno if that'll turn up much but might be good as a first pass

@fitzgen
Copy link
Copy Markdown
Member Author

fitzgen commented Apr 15, 2026

My rough assumption as well is that long-term we'll want nurseries as well, right? If that's the case would it make sense to document that in an issue as an open future work item?

Not totally clear to me right now. A nursery probably isn't worth it for our typical short-running programs, since we can mostly just avoid collecting in those cases, and would only slow down wasm execution via the write barriers it would require. (Also, our amortized GC heap growth algorithm effectively means we have N nursery collections on the way to the full heap size, if you squint hard.)

But someone with long-running wasm instance use cases may indeed want a nursery. But then we also wouldn't want to force a nursery on the short-running use cases that don't need it.

Seems like the kind of bridge we can cross when we get to it. I wouldn't go so far as to say that we should expect it to happen with enough certainty that a tracking issue is required at this time.

@fitzgen fitzgen requested a review from a team as a code owner April 15, 2026 21:30
@fitzgen fitzgen requested review from uweigand and removed request for a team April 15, 2026 21:30
@fitzgen fitzgen enabled auto-merge April 15, 2026 21:31
@fitzgen
Copy link
Copy Markdown
Member Author

fitzgen commented Apr 15, 2026

Adding this to the merge queue now, but we can definitely continue with some of these other bits in follow ups.

@fitzgen fitzgen added this pull request to the merge queue Apr 15, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 15, 2026
fitzgen added a commit to fitzgen/wasmtime that referenced this pull request Apr 29, 2026
Passive element segments were registered as GC roots by converting a `*mut
ValRaw` to a `*mut VMGcRef`. Because we always store GC refs as little endian
inside `ValRaw`, regardless of the target architecture's endianness, this cast
is only valid on little endian systems. This bug was not exposed until the
introduction of the copying collector in
bytecodealliance#13107

The fix that this commit makes is to add another kind of GC root for `ValRaw`,
where we can get and set the GC root using `ValRaw` internally, to ensure that
the endianness (and incidentally also the GC ref offsets within the `ValRaw`)
are matched up correctly between the GC root's definition and the collector's
use of it. This brings us to three kinds of GC roots: Wasm stack roots,
`VMGcRef` roots, and `ValRaw` roots.

FWIW, I initially tried to make `VMGcRef` also always store its data as little
endian, but this was a larger, more-invasive change and with feedback like
bytecodealliance#13193 (comment)
suggesting the use of `[u8; 4]` instead of `u32` to make the byte ordering
explicit, we break `rustc`'s niche type optimizations (since `VMGcRef` is
non-zero right now). I also investigated making `PassiveElementSegment` an
`enum` or either funcrefs or externrefs, similar to what we do for
`wasmtime::runtime::vm::Table`. This also led to an outsized amount of code
churn and didn't feel like it was paying for itself. Ultimately, I abandoned
these approaches, preferring the one taken in this commit instead.
@fitzgen
Copy link
Copy Markdown
Member Author

fitzgen commented Apr 29, 2026

My rough assumption as well is that long-term we'll want nurseries as well, right? If that's the case would it make sense to document that in an issue as an open future work item?

Not totally clear to me right now. A nursery probably isn't worth it for our typical short-running programs, since we can mostly just avoid collecting in those cases, and would only slow down wasm execution via the write barriers it would require. (Also, our amortized GC heap growth algorithm effectively means we have N nursery collections on the way to the full heap size, if you squint hard.)

But someone with long-running wasm instance use cases may indeed want a nursery. But then we also wouldn't want to force a nursery on the short-running use cases that don't need it.

Seems like the kind of bridge we can cross when we get to it. I wouldn't go so far as to say that we should expect it to happen with enough certainty that a tracking issue is required at this time.

FWIW, I just spun out this issue with some forward thinking ideas/complications related to generational GC and our desire to generally only have O(1)-sized host allocations outside the GC heap: #13231

fitzgen added a commit to fitzgen/wasmtime that referenced this pull request Apr 29, 2026
The `rewrite_use` method of the safepoint spiller was not checking for value
aliases, and therefore some uses of needs-stack-map values would not be reloaded
from their associated stack slot. Note that, because the liveness
analysis *does* correctly analyze alias values and will always correctly spill
them at safepoints, this could not result in any bug with non-moving
collectors (where reloading after safepoints is unnecessary), like those that
Wasmtime has today.

However, with the introduction of a moving collector in
bytecodealliance#13107, this lack-of-reload bug
in the safepoint spiller does trigger bugs in Wasmtime (and, reassuringly, our
testing and fuzzing infrastructure finds it ~immediately). Uses of a
non-reloaded GC reference are stale because the object they previously pointed
to moved but the non-reloaded GC reference was not updated to point to the
object's new location, resulting in use-after-move bugs.

The fix for the safepoint spiller is simple: resolve aliases before rewriting
uses. This commit additionally sprinkles some debug assertions throughout all
the safepoint spiller code to double check that we have already resolved aliases
and are no longer dealing with alias values in, e.g., our current set of live
values in the liveness analysis.
@fitzgen fitzgen force-pushed the initial-impl-copying-collector branch from ce9c44e to 896b0b0 Compare April 29, 2026 14:09
@fitzgen
Copy link
Copy Markdown
Member Author

fitzgen commented Apr 29, 2026

Rebased on top of main and two non-copying-collector-specific bug fixes for pre-existing bugs uncovered by the copying collector (#13228 and #13230).

Only new commit outside of that is 896b0b0, which just introduces a few additional tests.

This should be good to land (after another rebase) once those bug fixes land.

pull Bot pushed a commit to langyo/wasmtime that referenced this pull request Apr 29, 2026
)

* Fix passive element segment GC roots' endianness

Passive element segments were registered as GC roots by converting a `*mut
ValRaw` to a `*mut VMGcRef`. Because we always store GC refs as little endian
inside `ValRaw`, regardless of the target architecture's endianness, this cast
is only valid on little endian systems. This bug was not exposed until the
introduction of the copying collector in
bytecodealliance#13107

The fix that this commit makes is to add another kind of GC root for `ValRaw`,
where we can get and set the GC root using `ValRaw` internally, to ensure that
the endianness (and incidentally also the GC ref offsets within the `ValRaw`)
are matched up correctly between the GC root's definition and the collector's
use of it. This brings us to three kinds of GC roots: Wasm stack roots,
`VMGcRef` roots, and `ValRaw` roots.

FWIW, I initially tried to make `VMGcRef` also always store its data as little
endian, but this was a larger, more-invasive change and with feedback like
bytecodealliance#13193 (comment)
suggesting the use of `[u8; 4]` instead of `u32` to make the byte ordering
explicit, we break `rustc`'s niche type optimizations (since `VMGcRef` is
non-zero right now). I also investigated making `PassiveElementSegment` an
`enum` or either funcrefs or externrefs, similar to what we do for
`wasmtime::runtime::vm::Table`. This also led to an outsized amount of code
churn and didn't feel like it was paying for itself. Ultimately, I abandoned
these approaches, preferring the one taken in this commit instead.

* fix compilation of no-gc builds

* review feedback
fitzgen added a commit to fitzgen/wasmtime that referenced this pull request Apr 29, 2026
The `rewrite_use` method of the safepoint spiller was not checking for value
aliases, and therefore some uses of needs-stack-map values would not be reloaded
from their associated stack slot. Note that, because the liveness
analysis *does* correctly analyze alias values and will always correctly spill
them at safepoints, this could not result in any bug with non-moving
collectors (where reloading after safepoints is unnecessary), like those that
Wasmtime has today.

However, with the introduction of a moving collector in
bytecodealliance#13107, this lack-of-reload bug
in the safepoint spiller does trigger bugs in Wasmtime (and, reassuringly, our
testing and fuzzing infrastructure finds it ~immediately). Uses of a
non-reloaded GC reference are stale because the object they previously pointed
to moved but the non-reloaded GC reference was not updated to point to the
object's new location, resulting in use-after-move bugs.

The fix for the safepoint spiller is simple: resolve aliases before rewriting
uses. This commit additionally sprinkles some debug assertions throughout all
the safepoint spiller code to double check that we have already resolved aliases
and are no longer dealing with alias values in, e.g., our current set of live
values in the liveness analysis.
@fitzgen fitzgen force-pushed the initial-impl-copying-collector branch from 896b0b0 to 720b100 Compare April 29, 2026 14:55
pull Bot pushed a commit to Haofei/wasmtime that referenced this pull request Apr 29, 2026
…nce#13228)

The `rewrite_use` method of the safepoint spiller was not checking for value
aliases, and therefore some uses of needs-stack-map values would not be reloaded
from their associated stack slot. Note that, because the liveness
analysis *does* correctly analyze alias values and will always correctly spill
them at safepoints, this could not result in any bug with non-moving
collectors (where reloading after safepoints is unnecessary), like those that
Wasmtime has today.

However, with the introduction of a moving collector in
bytecodealliance#13107, this lack-of-reload bug
in the safepoint spiller does trigger bugs in Wasmtime (and, reassuringly, our
testing and fuzzing infrastructure finds it ~immediately). Uses of a
non-reloaded GC reference are stale because the object they previously pointed
to moved but the non-reloaded GC reference was not updated to point to the
object's new location, resulting in use-after-move bugs.

The fix for the safepoint spiller is simple: resolve aliases before rewriting
uses. This commit additionally sprinkles some debug assertions throughout all
the safepoint spiller code to double check that we have already resolved aliases
and are no longer dealing with alias values in, e.g., our current set of live
values in the liveness analysis.
fitzgen added 11 commits April 29, 2026 10:57
This is a classic semi-space copying collector that uses bump allocation and
Cheney-style worklists to avoid an explicit stack for grey (relocated but not
yet scanned) objects outside the GC heap. Forwarding "pointers" (really
`VMGcRef` indices) are stored inline in objects in the old semi-space, which
again avoids explicit data structures outside of the GC heap. Furthermore, an
intrusive linked-list of all `externref`s is maintained to allow for efficiently
sweeping their associated host data after collection (and, once again, avoiding
additional data structures outside the GC heap).

Allocation in compiled Wasm code always happens by calling out to the
`gc_alloc_raw` libcall currently. This is expected to become inline bump
allocation, very similar to the null collector, in a follow-up commit shortly
after this one. It is delayed to make review easier.

Collection is not incremental (in the Wasmtime sense, not the GC literature
sense) yet and the `CopyingCollection::collect_increment` implementation only
has a single increment. This is delayed to make review easier and is also
expected to come shortly in follow-up commits.

I've also added new disas tests for the copying collector and also enabled
running wast tests which need a GC heap with the copying collector.

Finally, fuzz config generation can now generate configurations that enable the
copying collector.
@github-actions github-actions Bot added the cranelift Issues related to the Cranelift code generator label Apr 29, 2026
@fitzgen fitzgen force-pushed the initial-impl-copying-collector branch from a85b205 to 2064150 Compare April 29, 2026 20:50
@fitzgen fitzgen enabled auto-merge April 29, 2026 20:50
@fitzgen fitzgen added this pull request to the merge queue Apr 29, 2026
Merged via the queue into bytecodealliance:main with commit 2dd03bc Apr 29, 2026
48 checks passed
@fitzgen fitzgen deleted the initial-impl-copying-collector branch April 29, 2026 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift Issues related to the Cranelift code generator fuzzing Issues related to our fuzzing infrastructure wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:ref-types Issues related to reference types and GC in Wasmtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement the copying collector described in the GC RFC

2 participants