Skip to content

perf: finish H1 streaming JSONL output#275

Closed
omerbenamram wants to merge 12 commits intoperf/streaming-output-noallocfrom
perf/h1-json-stream-scratch
Closed

perf: finish H1 streaming JSONL output#275
omerbenamram wants to merge 12 commits intoperf/streaming-output-noallocfrom
perf/h1-json-stream-scratch

Conversation

@omerbenamram
Copy link
Owner

@omerbenamram omerbenamram commented Dec 27, 2025

Summary

  • Reuse a single JsonStreamOutput<Vec<u8>> across records for evtx_dump -o jsonl -t 1 to reduce per-record allocation churn.
  • Replace per-object HashSet duplicate-key tracking with a small fixed table (Zig-style) + suffix counters.
  • Switch key interning to lasso::Rodeo (with ahasher + inline-more) to reduce string interning overhead.
  • Update PERF.md with benchmark + samply deltas and add/copy perf artifacts.

Perf

  • Workload: W1 (samples/security_big_sample.evtx), -t 1 -o jsonl, quiet-gated on omer-pc
  • Before: median 609.1 ms
  • After: median 526.3 ms
  • Speedup: 1.157× (≈ 13.6% lower median)
  • Hyperfine artifact: target/perf/h1-lasso-ahash-before-vs-after.hyperfine.json

Samply (macOS, 200 iterations)

  • Allocator/memmove (before → after4):
    • _platform_memmove: 7.38% → 4.80% leaf
    • _rjem_malloc: 3.15% → 1.23% leaf
    • alloc::raw_vec::RawVecInner<A>::finish_grow: 1.62% → 0.96% leaf
  • Key tracking + interning:
    • hashbrown::map::HashMap<K,V,S,A>::get_inner: 3.20% → 0.00% leaf (after1 → after2)
    • <core::hash::sip::Hasher<S> as core::hash::Hasher>::write: 7.32% → 2.01% leaf (after3 → after4, enabling ahasher)
  • Samply artifacts:
    • target/perf/samply/h1_before.profile.json.gz + target/perf/samply/h1_before.profile.json.syms.json
    • target/perf/samply/h1_after.profile.json.gz + target/perf/samply/h1_after.profile.json.syms.json
    • target/perf/samply/h1_after2.profile.json.gz + target/perf/samply/h1_after2.profile.json.syms.json
    • target/perf/samply/h1_after4.profile.json.gz + target/perf/samply/h1_after4.profile.json.syms.json

Test plan

  • cargo test --features fast-alloc --locked

Note

Completes H1 by removing hot-path allocations in streaming JSONL and tightening duplicate-key handling.

  • Reuse one JsonStreamOutput<Vec<u8>> for evtx_dump -t 1 -o jsonl via dump_json_streaming_single_thread (avoids per-record Vec/String churn)
  • Overhaul JsonStreamOutput internals:
    • Fixed-table duplicate-key tracking UniqueKeyTable (no per-key hashing on hot path)
    • Key interning with lasso::Rodeo (ahasher, inline-more)
    • Inline scalar buffer BufferedValues and recycled object frames; new buffer()/clear_buffer()/reserve_buffer()
  • New EvtxRecord::write_json_stream for writing into an existing builder
  • PERF.md updated with benchmarks, profiles, and artifacts (W1: 609.1 ms → 526.3 ms, ~13.6% faster)
  • Add lasso dependency; lockfile updates; regression tests (including reuse-after-error state reset)

Written by Cursor Bugbot for commit 9265a68. This will update automatically on new commits. Configure here.

Reuse a single JsonStreamOutput buffer in evtx_dump (-t 1) and reduce key/value churn in JsonStreamOutput.
Record omer-pc hyperfine numbers and macOS samply leaf deltas for the H1 JSON streaming scratch reuse experiment.
Replace per-object HashSet duplicate key tracking with a small, Zig-style name/count table.
This reduces hash/memcmp overhead on the JSON streaming hot path.
Avoid extra Arc clones in the Zig-style key table and compare wide pointers by address.
Record omer-pc hyperfine results and macOS samply leaf deltas for the fixed-table duplicate-key tracker.
Use pointer-address comparisons directly over the Arc list to keep the Zig-style key table simpler and more idiomatic.
Update omer-pc hyperfine medians to match the latest artifact.
Replace Arc<str> key interning with a safe KeyId arena (Vec<Box<str>>), and resolve keys by id.
This keeps duplicate-key tracking hash-free while avoiding refcount overhead.
Swap the custom KeyId arena interner for lasso::Rodeo to speed up get-or-intern and key resolution.
Enable lasso's ahasher + inline-more features to reduce SipHash overhead in get_or_intern on the streaming JSON path.
Record new omer-pc hyperfine medians and macOS samply leaf deltas for lasso key interning and the ahasher improvement.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant