perf: 4-digit SWAR follow-up in ffc_loop_parse_if_eight_digits#23
Merged
kolemannix merged 1 commit intoMay 27, 2026
Merged
Conversation
Also fix double-read of ffc_read8_to_u64(*p) in the 8-digit loop. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fcostaoliveira
added a commit
to redis-performance/ffc.h
that referenced
this pull request
Jun 3, 2026
…/force-inline-ffc-impl Resolves the ffc_loop_parse_if_eight_digits conflict by keeping both changes: - our Clang/AArch64 manual 2x (16-digit) unroll of the SWAR loop, and - upstream's new 4-digit follow-up block for sub-8-digit remainders. The follow-up sits after the #if/#else digit loop, so it benefits both the Clang/AArch64 unrolled path and the GCC/portable while-loop path. ffc.h regenerated; unit + supplemental tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
fcostaoliveira
added a commit
to redis-performance/ffc-agent-workspace
that referenced
this pull request
Jun 9, 2026
kolemannix/ffc.h#23 (4-digit SWAR follow-up) merged 2026-05-27 is OUR PR but was omitted (README listed ffc as "lands directly, no PRs"). Add a kolemannix/ffc.h row (1 merged, 3 open), add fast_float #387, fix total 4 -> 6. IMPACT.md and the upstream-prs memory brought in sync with a cross-repo merged ledger. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
nickva
added a commit
to davisp/jiffy
that referenced
this pull request
Jun 13, 2026
A quick microbench showed 10% speedup on numbers.json. Upstream PR: kolemannix/ffc.h#23
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
ffc_loop_parse_if_eight_digitsonly fires whenpend - p >= 8. Numberswith 5–7 fractional digits — geographic coordinates, mesh data, most
real-world floats — have a 7-digit fraction, so the 8-digit SWAR loop
never triggers for them. All digit scanning fell back to byte-by-byte
iteration.
ffc_parse_four_digits_unrolledandffc_is_made_of_four_digits_fastwere dead code on this common input shape.A secondary issue: the original loop called
ffc_read8_to_u64(*p)twiceper iteration — once to check, once to parse — doing the same 8-byte load
twice.
Fix
Two changes in
ffc_loop_parse_if_eight_digits:Read once: load into a local
val, pass to bothffc_is_made_of_eight_digits_fastandffc_parse_eight_digits_unrolled_swar.4-digit SWAR follow-up: after the 8-digit loop exits, if
pend - p >= 4and the next 4 bytes are all ASCII digits, consume themwith
ffc_parse_four_digits_unrolled. For a 7-digit fraction thisconverts 7 byte-by-byte iterations into 1×SWAR-4 + 3 byte-by-byte —
roughly 43% fewer digit-scanning steps on the most common input length.
Benchmark
Results measured on dedicated metal VMs using
simple_fastfloat_benchmark,3-run averages. Baseline = same machine, same binary, pre-patch.
x86 — Intel Xeon Platinum 8488C (AWS m7i.metal-24xl)
ARM — Graviton4 (AWS m8g.metal-24xl)
random [0,1]numbers have 14–17 fractional digits — the 8-digit loopalready fires for them, so the 4-digit follow-up is never taken. The
slight dip is within run-to-run noise (the check costs one branch).
canada.txtandmesh.txtboth reflect real-world inputs where 7fractional digits dominate. The improvement is consistent across
architectures and stable across runs (< 0.5% spread).
Methodology
This change was identified and validated using
ffc-agent-workspace,a structured optimization workspace for ffc.h inspired by
AutoKernel (Jaber & Jaber, 2026).
The workflow:
perf recordon the benchmark binary identifiedffc_loop_parse_if_eight_digitsas a hot symbol on canada/mesh inputs.as a falsifiable claim before any editing.
exhaustive tests all pass before benchmarking.
profile must also show the target symbol's CPU % decreasing or IPC
increasing.
on the same metal VM in the same session to eliminate environment noise.
post numbers, and the decision rationale, in
experiments/EXPERIMENTS.md.Full experiment log: EXP-001 in ffc-agent-workspace