perf: two-lifetime arenas + streaming XML/JSON output#268
perf: two-lifetime arenas + streaming XML/JSON output#268omerbenamram wants to merge 21 commits intomasterfrom
Conversation
Performance optimizations inspired by Zig EVTX parser: 1. ASCII fast path for UTF-16 to UTF-8 conversion (binxml_utils.rs) - Bypass decode_utf16 iterator for pure ASCII strings (~95% of EVTX strings) - Direct conversion when all code units are <= 0x7F 2. Use hashbrown HashMap for caches (string_cache.rs, template_cache.rs) - Faster lookups with inline optimization 3. Direct JSON string writing (json_stream_output.rs) - Add write_json_string_ncname() for XML NCName strings (no escaping needed) - Replace serde_json::to_writer() with direct byte writes for keys - XML element/attribute names follow NCName rules, safe to write directly Benchmark results: - Master: 194.9 ms - Optimized: 132.4 ms - Improvement: 1.47x faster (47% reduction in execution time)
Detailed markdown document covering: - ASCII fast path for UTF-16 to UTF-8 (~5% improvement) - Hashbrown HashMap for caches (~1% improvement) - Direct JSON string writing (~4% improvement) - Total 1.47x speedup vs master - Remaining opportunities to close gap with Zig
- Reverted hashbrown to std::collections::HashMap (std uses hashbrown internally) - Added detailed profiling analysis showing bottlenecks: - Memory allocation: ~29% of CPU time (170+ samples) - Memory copying: ~6% (cloning during template expansion) - HashMap hashing: ~6% - Template expansion: ~9% - Updated remaining opportunities with architectural solutions needed - Current gap: Zig is 3.46x faster (574ms vs 166ms single-threaded)
Avoid cloning cached template tokens during streaming expansion, and reduce JSON duplicate-key bookkeeping / timestamp formatting overhead to cut CPU time.
e45ba0c to
097dc0e
Compare
- Add PERF.md (mft-style hypothesis workflow + artifacts)\n- Link from README + profile_comparison.sh\n- Ignore local tmp/ scratch dir\n- Include opt-in perf ablation feature flags (perf_ablate_*)
- Add scripts/ensure_quiet.sh (macOS CPU idle + load1 gate)\n- Integrate with profile_comparison.sh via QUIET_CHECK=1 (hyperfine --prepare + pre-profile wait)\n- Document usage and baseline environment in PERF.md
Support Linux by sampling idle% from /proc/stat deltas and load1 from /proc/loadavg.
- Add omer-pc environment + master-vs-branch baseline numbers (t=1 and t=8)\n- Document quiet-check thresholds used on that machine
- Add BENCH_MT (default on) so multi-thread benchmark can be skipped\n- Keep QUIET_CHECK support via hyperfine --prepare
| s.push_str(escaped.as_ref()); | ||
| } | ||
| _ => {} | ||
| } |
There was a problem hiding this comment.
Entity references dropped in JSON output for attributed elements
The refactored visit_entity_reference in JsonOutput no longer handles the case when separate_json_attributes is false (the default) and the current value is a non-empty object. Previously, calling visit_characters would add entity references to the #text field of objects with attributes. The new inline logic only handles Value::Null, empty objects with separate_json_attributes=true, and Value::String - falling through to silent no-op for other cases. This causes entity references like & to be silently dropped when they appear in elements with attributes, resulting in data loss in JSON output. The streaming output in handle_entity_string replicates this bug, with a comment incorrectly stating it matches legacy behavior.
Additional Locations (1)
| BinXmlValue::Real64Type(n) => { | ||
| let mut buf = ryu::Buffer::new(); | ||
| self.write_bytes(buf.format(*n).as_bytes()) | ||
| } |
There was a problem hiding this comment.
NaN and Infinity floats produce invalid JSON output
The streaming JSON output uses ryu::Buffer::format() for Real32Type and Real64Type values, which outputs "NaN", "inf", or "-inf" for special IEEE 754 float values. These are not valid JSON tokens. The legacy path using serde_json::json!() macro converts NaN and Infinity to null, producing valid JSON. If an EVTX file contains float fields with special values, the streaming output produces syntactically invalid JSON that cannot be parsed, while the legacy output would produce null.
Additional Locations (1)
- Add per-optimization writeups + attribution table (omer-pc, -t 1)\n- Remove perf_ablate_* feature flags + code branches\n- Drop UTF-16 ASCII fast-path (slightly regressed on omer-pc)\n- Keep a curated hyperfine JSON artifact
- Add Samply-backed hotspot evidence and a Zig-informed plan to remove remaining JSON output allocations\n- Define success metrics + guardrails for the next big experiment
- Remove smallvec suggestion\n- Incorporate Zig-style memory reuse and a per-record scratch bump (separate from chunk arena)
Summary
evtx_dumpto use the byte-lending streaming APIs for XML/JSON output.Performance
Plan reference:
two-lifetime_allocator_refactor_(rust_vs_zig)_9016de49.plan.md(local path:/Users/omerba/.cursor/plans/two-lifetime_allocator_refactor_(rust_vs_zig)_9016de49.plan.md).Benchmark scenario (following the plan’s
measure-iterateguidance):/Users/omerba/Workspace/evtx/samples/security_big_sample.evtx-t 1-o jsonl/dev/nullcargo build --release --features fast-alloc --bin evtx_dumpzig build -Doptimize=ReleaseFasthyperfine --warmup 2 --runs 10Exact results (mean ± σ, 10 runs):
Derived speedups (from the same run):
(Artifacts:
benchmarks/perf_pr_20251214_134402.{json,md}on my machine.)Test plan
cargo test -qNote
Significantly reduces allocator churn and cloning by introducing arena-backed data and streaming expansion/serialization.
bumpaloarena through deserializer/template cache/value types;BinXmlValueand arrays now arena-owned; APIs updated to acceptarenaserde_json::Valueon hot paths; manual string escaping; numbers viaitoa/ryu; faster datetime formatting; duplicate-key handling tunedPartialEq(removePartialOrd); minor UTF-16 and SID read cleanupsPERF.md,profile_comparison.sh,scripts/ensure_quiet.sh, and saved benchmark JSON; update README and .gitignorebumpalo,itoa,ryuWritten by Cursor Bugbot for commit 687c0dc. This will update automatically on new commits. Configure here.