Skip to content

[perf] Pool Envelope instances for the internal receive pipeline #2726

@jeremydmiller

Description

@jeremydmiller

Part of the 6.0 performance pass — Tier 1, but deferred until the CritterStackScalability benchmark harness is in place. The work doesn't land without a heavy-duty benchmark showing measurable allocation/throughput improvement.

Context

`Envelope` is a class allocated per-message. Per the 6.0 perf-pass allocation audit:

  • Each instance is ~240–280 bytes (class header + ~18 fields incl. headers dict ref, `byte[]` ref, `object` ref for Message, `Guid`, `DateTimeOffset?`, multiple nullable strings/Uris).
  • The receive pipeline allocates an Envelope on every InvokeAsync — `src/Wolverine/Runtime/Handlers/Executor.cs:145, 171, 292` are the three sites for the request/reply pattern.
  • Cascading messages: every handler-returned message becomes a `new Envelope` via `Envelope.CreateForResponse()` (`Envelope.Internals.cs:240-264`). A handler that publishes 5 messages allocates 5 Envelopes.
  • `MessageContext` is already pooled (`ObjectPool`, see `Executor.cs:46` and `WolverineRuntime.ExecutionPool`). Envelope is the obvious next target.

Plan (decided during interview, ready to execute when prerequisites land)

Q1 — Tracking-reference capture hazard → option (A)

The hazard: `EnvelopeRecord.Envelope` (`src/Wolverine/Tracking/EnvelopeRecord.cs:40`) holds a strong reference. `_tracker.ExecutionStarted/Finished` flows through `ActiveSession.Record(...)` → `new EnvelopeRecord(...)` which captures that reference into the tracking history. If we pool a tracked envelope, every `ITrackedSession.Events` reader from before the recycle sees corrupted state.

Decision: don't pool envelopes when a tracking session is active. Check `WolverineRuntime.ActiveSession` at envelope-acquire time; if non-null, allocate fresh. The pool only helps in production hot paths (tracking off), which is where the perf win matters anyway. Tests using tracking see zero behavior change.

Q2 — `Reset()` field discipline → option (b)

`Envelope` has ~30 public properties. Approach: hand-zero every field in `internal void Reset()`, plus a guard test that reflects over `typeof(Envelope).GetProperties()` and asserts each settable property is at its default value post-`Reset()`. The guard catches drift when new fields are added without remembering to zero them.

Reset shape:

```csharp
internal void Reset() {
Message = null;
Data = null;
Id = default;
CorrelationId = ConversationId = null;
// … explicitly zero every field including _headers (don't pool the dict
// itself — let it be GC'd; preserving the dict across pool cycles is its
// own headache)
}
```

Guard test (sketch):

```csharp
[Fact]
public void Reset_zeroes_every_settable_property()
{
var envelope = new Envelope { /* set every property */ };
envelope.Reset();
foreach (var prop in typeof(Envelope).GetProperties(BindingFlags.Public | BindingFlags.Instance))
{
if (!prop.CanWrite) continue;
var value = prop.GetValue(envelope);
Assert.Equal(GetDefault(prop.PropertyType), value);
}
}
```

Q3 — Scope → option (α) with per-site benchmark

Both `Executor.cs` sites and the `MessageRouter.RouteForPublish` → `PersistOrSendAsync` outgoing path land in one PR, but each pool consumer must come with a benchmark that proves measurable allocation/Gen0-frequency reduction. We judge each independently at PR review time. A consumer without a benchmark win gets reverted before merge; the Reset()/pool plumbing stays in either way.

Where to pool, where not to

  • Pool: the three `new Envelope(message)` sites in `Executor.cs` (InvokeAsync request/reply scaffolding). These live entirely within the framework's invocation scope; the user only sees the resulting `MessageContext.Envelope` after `Reset()` has populated it, and never sees the raw pooled instance.
  • Pool: outgoing envelopes produced by `MessageRouter.RouteForPublish` if and only if the producer is `MessageBus.PublishAsync`/`SendAsync` and the envelope flows directly into `PersistOrSendAsync` → transport. The lifecycle is bounded by the send call.
  • Don't pool: `Envelope.CreateForResponse()` for cascading messages where the cascade target is user code or a tracked session. The lifecycle isn't bounded.
  • Don't pool: envelopes deserialised from transport. They flow through user handler code and beyond; lifetime is unbounded.
  • Don't pool when `WolverineRuntime.ActiveSession` is non-null (per Q1 above).

A simpler statement: only pool envelopes that the framework controls from creation to dispose, and only when tracking is off. Anywhere user code or tracking can capture the reference, allocate.

Pool sizing

`ObjectPool` default `DefaultObjectPoolProvider` retains `Environment.ProcessorCount * 2`. Reasonable starting point. Worth instrumenting once the benchmark is in place to confirm we're not under-pooled (constant alloc when pool is empty) or over-pooled (retaining stale instances). Mirror the `WolverineRuntime.ExecutionPool` registration pattern for `ObjectPool`.

Prerequisites (hard blockers before any code work)

  1. CritterStackScalability benchmark harness — cold-start child-binary harness for Wolverine in place. Currently blocked on the fixture-size decision (200 vs 30 vs 100 types).
  2. Runtime-throughput benchmark in the same harness — specifically capturing per-message Gen0 collection frequency, allocation bytes, and p99 latency.
  3. Tier 1 [perf] Eliminate hot-path ImHashMap writes via bootstrap-time pre-population #2724 fully done — already done for Site 3 (perf(endpoint): pre-populate serializer cache at Compile() to eliminate hot-path mutation (closes #2724 Site 3) #2732 merged). Sites 1/2 are no-ops or deferred per [perf] Eliminate hot-path ImHashMap writes via bootstrap-time pre-population #2724's investigation comment.

Acceptance criteria (when work begins)

  • `ObjectPool` registered in `WolverineOptions.Services`, mirroring `ObjectPool` shape.
  • `Envelope.Reset()` (internal) zeroes every field. Guard test (Q2) verifies coverage and fails CI when drift is introduced.
  • Three `Executor.cs` sites pull from pool when `ActiveSession` is null; allocate fresh otherwise.
  • `MessageRouter` / `PersistOrSendAsync` outgoing path uses pool with strict lifetime ownership.
  • No user-visible Envelope is pooled.
  • Test suite passes including tracking, observer, saga, and `ITrackedSession.Events` capture-after-handler scenarios — the last specifically asserts that an envelope reference captured inside a handler still has the same field values after the handler returns and the request completes.
  • Benchmark gate: each consumer (Executor sites; MessageRouter outgoing) shows measurable improvement in the CritterStackScalability runtime-throughput harness. A consumer that doesn't show a measurable win gets the actual pooling reverted at review time; Reset()/pool plumbing stays in for the consumers that do win.

Risks

  • Mis-pooling a tracked envelope — rare in production (tracking off), but the silent-corruption mode if our `ActiveSession` check misses a tracking sink. Mitigation: the Q1 (A) gate + the `ITrackedSession.Events capture-after-handler` acceptance test.
  • Pool over-retention via Reset miss — if Reset() doesn't clear a reference (e.g., `Headers` dict), the pool holds the referenced object alive indefinitely. The Q2 (b) guard test catches this for every settable property; only `_headers` (private field) needs an explicit assertion.
  • No measurable win — entirely possible that ObjectPool + Reset overhead approximately equals `new Envelope()` cost in the hot path. Benchmark answers this before we commit to the change.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions