Use ffc (pure-C99) as the RESP3 double parser instead of strtod by fcostaoliveira · Pull Request #1328 · redis/hiredis

fcostaoliveira · 2026-06-02T13:18:23Z

Summary

RESP3 double replies (the , type) are parsed in read.c with strtod(). This PR replaces that with ffc, a pure-C99 single-header correctly-rounded float parser (a C port of Daniel Lemire's fast_float), vendored as ffc.h. strtod remains available as a fallback via -DHIREDIS_FLOAT_STRTOD.

Two reasons, in order of importance:

1. Locale correctness (a latent bug)

strtod() honours the process LC_NUMERIC locale, but RESP3 doubles are always .-separated. A hiredis client embedded in a process that has called e.g. setlocale(LC_NUMERIC, "de_DE.UTF-8") (decimal comma) misparses the valid reply ,3.14\r\n: strtod("3.14") returns 3.0, and the existing eptr != &buf[len] check then rejects it as a "Bad double value" PROTOCOL error. So under a comma-locale, hiredis errors on essentially every double reply today.

ffc takes the decimal point as an explicit option (default '.'), so it is locale-independent by construction.

2. Speed

ffc is several times faster than glibc strtod. Parse-only microbenchmark (best of 200 iterations, pinned core). Both paths copy + NUL-terminate identically (see note below), so this is a pure parser comparison:

Platform	dataset	strtod (hiredis)	ffc	speedup
x86 (Intel)	random `[0,1]`	108 MB/s	468 MB/s	+335%
x86 (Intel)	canada (geo coords)	100 MB/s	407 MB/s	+306%
x86 (Intel)	mesh (short floats)	85 MB/s	378 MB/s	+344%
ARM (Graviton4)	random `[0,1]`	194 MB/s	992 MB/s	+411%
ARM (Graviton4)	canada	170 MB/s	849 MB/s	+400%
ARM (Graviton4)	mesh	166 MB/s	584 MB/s	+251%

Double-heavy reply streams (TS.RANGE/TS.MRANGE, FT.SEARCH ... WITHSCORES, vector-search distances, ZRANGE ... WITHSCORES, GEODIST) are parse-bound on strtod.

Correctness

RESP3 semantics are preserved exactly:

The strict inf/-inf/nan/-nan tokens are matched as before (strcasecmp on the copied buffer).
ffc runs with NO_INFNAN, so the numeric path only ever yields finite values.
The full-consume (res.ptr == buf+len) + isfinite checks mirror the strtod path's eptr/isfinite checks.
ffc is in fact stricter than strtod: it rejects the leading-whitespace and C99 hex-float inputs (" 3.14", "0x1p4") that strtod silently accepts -- neither is valid RESP3.

Testing:

Existing double reader tests pass (incl. the embedded-NUL invalid case and the array case).
Added in-tree tests: bit-exact edge magnitudes (0, -0, 5e-324 subnormal, DBL_MIN, DBL_MAX, ...) and malformed rejections -- all green under both the ffc default and the -DHIREDIS_FLOAT_STRTOD fallback.
An out-of-tree sweep of 3,000,000 values (random bit patterns / [0,1) / mixed magnitudes / short floats, multiple printf formats) is bit-identical to strtod with zero accept/reject disagreements.

ffc.h compiles clean under hiredis's existing -std=c99 -pedantic -Werror -Wall -Wextra -Wstrict-prototypes -Wwrite-strings (a clang -Wstatic-in-inline push/ignore/pop is scoped around the vendored header; see the commit).

Notes

The buffer copy is retained. read.c still copies the value into a local buffer and NUL-terminates it before handing it to the createDouble callback, exactly as today -- some custom createDouble implementations may rely on str being NUL-terminated. ffc parses that buffer's [start, end) range (it needs no terminator). A copy-free variant (parsing the reader buffer directly) is ~40% faster again on the ffc path, but changes the createDouble string to a counted, non-terminated buffer; happy to add that separately if you're open to that contract.
ffc.h is a single vendored header (no build-system change; only read.c includes it). It is tri-licensed Apache-2.0 / MIT / Boost-1.0 and is included here under MIT (compatible with hiredis's BSD-3), marked with an SPDX header.
Default build now uses ffc; define HIREDIS_FLOAT_STRTOD to revert to strtod. Happy to keep it opt-in instead if you'd prefer to land the capability before flipping the default.

Note

Medium Risk
Changes core protocol parsing for all RESP3 double replies; behavior is intended to match strtod but relies on a large new vendored parser, with only test coverage and an opt-out macro for rollback.

Overview
RESP3 double parsing in read.c no longer uses strtod() by default. The reader now vendors ffc.h (MIT, C port of fast_float) and parses finite numeric payloads with ffc_from_chars_double_options using a fixed '.' decimal separator and NO_INFNAN, while still handling inf / -inf / nan / -nan via the existing strcasecmp branches. Validation stays aligned with the old path: the parser must consume the whole token and the result must be finite.

HIREDIS_FLOAT_STRTOD restores the previous strtod() implementation for builds that need it. Clang gets a scoped -Wstatic-in-inline suppression around the single-TU FFC_IMPL include.

Tests in test.c add bit-exact checks for edge magnitudes (including -0) and protocol errors for malformed doubles.

^{Reviewed by Cursor Bugbot for commit f7b670d. Bugbot is set up for automated code reviews on this repo. Configure here.}

RESP3 double replies (the ',' type) are parsed in read.c with strtod(). This replaces that with ffc, a pure-C99 single-header correctly-rounded float parser (a C port of Daniel Lemire's fast_float), vendored as ffc.h. strtod remains available as a fallback via -DHIREDIS_FLOAT_STRTOD. Why: 1. Speed. ffc is several times faster than glibc strtod. Parse-only microbench (best of 200, pinned core): x86 Intel : strtod 84-108 MB/s -> ffc 735-799 MB/s (~7-9x) ARM (Graviton4): strtod 164-195 -> ffc 1665-1832 (~9-10x) glibc strtod is the bottleneck; eliminating the per-reply NUL-terminated copy is only ~4-10% of it. Double-heavy replies (TS.RANGE, FT.SEARCH WITHSCORES, vector scores, ZRANGE WITHSCORES, GEODIST) are parse-bound on strtod today. 2. Locale correctness. strtod honours the process LC_NUMERIC locale, but RESP3 doubles are always '.'-separated. A hiredis client embedded in a process under e.g. setlocale(LC_NUMERIC, "de_DE.UTF-8") (decimal comma) misparses the valid reply ',3.14' as 3.0 -- which the existing eptr==end check then rejects as a "Bad double value" PROTOCOL error. ffc takes the decimal point as an explicit option ('.'), so it is locale-independent. This is a latent bug fix. 3. No per-reply copy. ffc parses the reader buffer range [p, p+len) directly; the dedicated 326-byte buffer + memcpy + NUL-terminate are gone. createDouble copies the textual form from p as before. RESP3 semantics are preserved exactly: the strict inf/-inf/nan/-nan tokens are matched in place (case-insensitive, no copy), ffc runs with NO_INFNAN so the numeric path only yields finite values, and the full-consume + isfinite checks mirror the strtod path's eptr/isfinite checks. ffc is in fact stricter than strtod (it rejects the leading-whitespace and C99 hex-float inputs strtod silently accepts, neither of which is valid RESP3). Correctness: existing double reader tests pass; added bit-exact edge-magnitude tests (0, -0, 5e-324 subnormal, DBL_MIN, DBL_MAX, ...) and malformed-rejection tests, all passing under both the ffc default and the strtod fallback. A 3,000,000-value sweep (random bit patterns / [0,1) / mixed magnitudes / short floats, multiple formats) is bit-identical to strtod with zero accept/reject disagreements. ffc.h is tri-licensed Apache-2.0 / MIT / Boost-1.0; it is included here under MIT (compatible with hiredis's BSD-3), marked with an SPDX header. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- macOS/Apple-Clang: ffc.h's extern-inline entry points call file-local static helpers, which trips -Wstatic-in-inline (default-on in Clang) and fails under hiredis's -Werror. Scope a clang diagnostic push/ignore/pop around the vendored ffc.h include (GCC has no such warning). Fixes the failing macOS CI job. - createDouble contract: keep the buf copy + NUL-terminate and hand createDouble the terminated buffer (as before), rather than the raw reader pointer. Some custom createDouble callbacks may rely on str being NUL-terminated; ffc itself parses a [start,end) range and needs no terminator. (Addresses review note.) - ffc.h: fix a vendored-amalgam bug in ffc_negative_digit_comp where ffc_am_to_float() was called with a hardcoded DOUBLE value-kind instead of the caller's vk, corrupting the float digit-comparison fallback (8-byte write / 4-byte read of the value union). Not reachable from hiredis (doubles only); fixed upstream in ffc and re-vendored. (Addresses review note.) Faithful parse microbench, both paths now copy+NUL-terminate identically (best of 200, pinned core): ffc vs strtod about +300-410% (x86 and ARM). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Reviewed by Cursor Bugbot for commit a8ca884. Configure here.}

The -0 case used 'dval == -0.0', but -0.0 == 0.0 in IEEE, so it would pass even if the parser returned +0.0. Also compare signbit() so '-0' must parse to a true -0.0. (Addresses review note on PR redis#1328.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

michael-grunder

LGTM

I didn't attempt to review ffc.h but I know it's been extensively tested.

Great performance win for ZSET heavy workloads!

fcostaoliveira · 2026-06-02T16:03:38Z

LGTM

I didn't attempt to review ffc.h but I know it's been extensively tested.

Great performance win for ZSET heavy workloads!

nice. Notice we're also working on helping improve ffc and fast_float :D so expect a a further boost.
Example:

perf: 4-digit SWAR follow-up in ffc_loop_parse_if_eight_digits kolemannix/ffc.h#23
Nine profiled micro-optimizations: +63% ARM mesh, +26% ARM canada, +17% ARM random vs baseline kolemannix/ffc.h#24
perf: Clang/AArch64 shift-add digit accumulator + 2x SWAR loop unroll kolemannix/ffc.h#25

and

kolemannix · 2026-06-02T16:21:15Z

A copy-free variant (parsing the reader buffer directly) is ~40% faster again on the ffc path, but changes the createDouble string to a counted, non-terminated buffer; happy to add that separately if you're open to that contract.

We should certainly try! A huge benefit of this switch is that the parser doesn't require null-termination, we've been able to remove copies from a number of places we've integrated ffc.h, usually the copy is entirely scoped to and only passed to strtod, so there's no blast radius at all.

In curious what's going on in hiredis that the NT string form of the float is being handed to other code, does it support custom parsing routines?

michael-grunder · 2026-06-02T16:49:32Z

I'm curious what's going on in hiredis that the NT string form of the float is being handed to other code, does it support custom parsing routines?

Yep, a lot of Redis clients are just thin wrappers around hiredis. They set their own callbacks to create domain specific values (e.g. hiredis-rb, etc).

kolemannix · 2026-06-02T19:15:50Z

Thanks, interesting; could you avoid the extra buffer copy in the case that there are no callbacks registered?

michael-grunder · 2026-06-02T19:52:31Z

Hiredis uses its own callbacks as kind of the "default callbacks" (createDoubleObject in this case), but in principle i'm sure we could engineer around it to avoid the copy.

ffc_negative_digit_comp called ffc_am_to_float(false, am_b, &b, FFC_VALUE_KIND_DOUBLE) but read the result back with ffc_to_extended_halfway(b, vk) on the next line. For vk == FLOAT this writes a double (8 bytes) into the ffc_value union and reads back the float member (4 bytes), producing a garbage 'theor' in the float negative-exponent digit-comparison (slow) path. Pass the caller's vk consistently, matching the adjacent ffc_to_extended_halfway call and upstream fast_float (which templates the float type throughout). Safe by construction: when vk == DOUBLE this is byte-identical to before (vk IS FFC_VALUE_KIND_DOUBLE), so the double path and the double-parsing fast/slow paths are unchanged; only vk == FLOAT changes, and only toward correctness. Validated: unit + 4M-value supplemental tests pass; a 20M-value ffc-vs-strtof parity sweep over high-digit-count (digit_comp-forcing) inputs is bit-identical. Found via review of redis/hiredis#1328 (Cursor Bugbot). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Default-on ffc + MIT vendoring; submodule -> ffc-double-parser @ f19c4b5 (squashed, pushed to fcostaoliveira/hiredis). PR redis/hiredis#1328 opened with bench + correctness evidence. Archive PR body; update tracker + memory. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…7% ARM random vs baseline (#24) Generated by an agent, run by of filipe@redis.com using Claude https://github.com/redis-performance/ffc-agent-workspace Reviewed by filipe and kolemannix ----------- * EXP-001: 4-digit SWAR follow-up in ffc_loop_parse_if_eight_digits Numbers with 5–7 digit fractions (canada.txt, mesh.txt) never triggered the 8-digit SWAR loop, falling back to 7 byte-by-byte iterations. A 4-digit SWAR follow-up converts those to 1×SWAR-4 + ≤3 byte-by-byte. canada: +29% (gap vs fastfloat: -29% → -4%) mesh: +18% (gap vs fastfloat: -10% → ±0%) random: no regression (high variance, within noise) * EXP-006: use local vars in too_many_digits path, allow GCC DSE of struct stores In ffc_parse_number_string, int_part_start/len and fraction_part_start/len were stored to the ffc_parsed answer struct unconditionally. These fields were then read back in the rare too_many_digits path — preventing GCC's DSE from eliminating the stores in non-JSON callers where ISRA removes those fields from the return ABI. Restructure: hoist 'before' and 'frac_end_local' out of the has_decimal_point block. In the too_many_digits path, use start_digits, end_of_integer_part, before, and frac_end_local directly instead of round-tripping through the answer struct. The stores to answer.int_part_*/fraction_part_* remain (needed by ffc_parse_json_number callers), but are no longer read within the function, so GCC ISRA + DSE eliminates them on non-JSON call sites. Result: x86 canada +2.2% (1414 → 1445 MB/s, 5-run stable). ARM unchanged (GCC's ARM backend already handled this via ISRA). * perf: FFC_IMPL_INLINE — force inline ffc_from_chars_double at call sites Add conditional `__attribute__((always_inline)) inline` to the forward declarations of `ffc_from_chars_double` and `ffc_from_chars_double_options` via a new `FFC_IMPL_INLINE` macro. When a translation unit defines `FFC_IMPL` before including `ffc.h` (the documented pattern for the implementation TU), GCC's `ipa_early_inline` pass sees the `always_inline` attribute on those declarations and inlines both functions at every call site *before* IPA-CP runs. This prevents GCC from creating a separate out-of-line constprop clone and eliminates the per-call function-call overhead on every `ffc_from_chars_double` call. The macro is defined as empty in non-FFC_IMPL TUs, so external linkage and ABI compatibility for other users of the header are preserved (the always_inline path does not emit an out-of-line symbol, but the existing `ffc_parse_double` / `ffc_parse_float` wrappers remain as exported symbols). * docs: update benchmark results with 2026-05-26 metal VM numbers Replace Apple Silicon numbers (2026-03-03) with 5-run averages from dedicated bare-metal servers: Intel Xeon Platinum 8488C (m7i.metal-24xl) and Graviton4 (m8g.metal-24xl), both GCC -O3. Post-EXP-009 results: ffc leads or matches fastfloat on all datasets on both architectures. * perf: combined exponent range check in ffc_clinger_fast_path_impl (EXP-012) Replace two signed comparisons (MIN <= e && e <= MAX) with a single unsigned range check: (uint64_t)(e - MIN) <= (MAX - MIN). This collapses the two-branch scattered layout into one branch with compact sequential code, matching fast_float's approach. ARM Graviton4: +4.6% mesh, +1.5% random, +1.0% canada. * Unroll fraction byte-by-byte tail: 3 nested ifs replace while loop After ffc_loop_parse_if_eight_digits returns, at most 3 consecutive digit bytes remain. Replacing the while loop with straight-line nested ifs eliminates the back-branch, yielding better IPC on out-of-order cores (+3.9% mesh, +2.1% canada on ARM Graviton4). * perf: straight-line integer scan — nested-ifs replace while loop for 1–4 digits Eliminates back-branches for the common 1–4 digit integer case. Falls back to while loop for 5+ digits. ARM Graviton4: random +4.2% (1823→1900 MB/s), canada +7.4% (1562→1677 MB/s), mesh +2.0% (1366→1394 MB/s). EXP-026. * perf: extend integer nested-ifs to 5 levels (EXP-028) Extends the straight-line integer scan from 4 levels to 5 levels, eliminating the while-loop back-branch for 5-digit integer parts common in mesh.txt 3D vertex coordinates (e.g. "12345.678"). Results on ARM Graviton4 (m8g.metal-24xl, 2.80 GHz): - mesh: +7.9% (1394→1505 MB/s, 14.75→13.66 c/f) - canada: +0.7% (1677→1688 MB/s, 29.02→28.84 c/f) - random: +0.4% (1900→1908 MB/s, 30.87→30.75 c/f) * perf: FFC_ROUNDS_TO_NEAREST macro eliminates FCMP chain (EXP-030) Add compile-time macro FFC_ROUNDS_TO_NEAREST that makes ffc_rounds_to_nearest() return a constant true, eliminating the 7-instruction volatile-float FCMP chain entirely. Guard the rounding-mode test that asserts ffc_rounds_to_nearest() == false under non-nearest modes. Benchmark results on ARM Graviton4 (Neoverse V2): mesh.txt +2.4% (1505 → 1541 MB/s, 100.86 → 93.86 i/f) canada +1.8% (1688 → 1718 MB/s, 196.02 → 189.93 i/f) random +0.8% (1908 → 1924 MB/s, 227.04 → 221.04 i/f) * EXP-033: early exit for exponent==0 in ffc_from_chars_advanced Adds a fast return path before the Clinger call for pure integers (exponent==0, mantissa<=2^53). Skips the range check, mantissa check, pow10 table load, and fmul with 1.0. Safe after EXP-030 eliminated the volatile FCMP chain. +12.9% mesh (83.92 i/f from 93.86), +1.1% canada, +0.4% random. All unit and supplemental tests pass. * fix: guard zero mantissa under FE_DOWNWARD in exponent==0 fast path Clang may convert (double)(uint64_t)0 to -0.0 when fegetround() == FE_DOWNWARD. The fast path for exponent==0 did not have the same guard that already existed in the non-nearest Clinger branch. Mirror the existing #if __clang__ || FFC_32BIT guard. Also revert readme.md: benchmark table update was not appropriate for the upstream repository. * EXP-039/042/044: checkpoint accepted shift-add acc10 + 2x SWAR unroll These accepted experiments (FFC_DIGIT_ACC10 shift-add asm for Clang/AArch64, acc10 on the exponent accumulator, and the 2x unroll of ffc_loop_parse_if_eight_digits) were applied to the working tree but never committed to the submodule. Checkpoint them so the submodule tip reflects the best accepted state and revert-on-reject is safe during the race. * fix: use caller's vk (not hardcoded DOUBLE) in ffc_negative_digit_comp ffc_negative_digit_comp called ffc_am_to_float(false, am_b, &b, FFC_VALUE_KIND_DOUBLE) but read the result back with ffc_to_extended_halfway(b, vk) on the next line. For vk == FLOAT this writes a double (8 bytes) into the ffc_value union and reads back the float member (4 bytes), producing a garbage 'theor' in the float negative-exponent digit-comparison (slow) path. Pass the caller's vk consistently, matching the adjacent ffc_to_extended_halfway call and upstream fast_float (which templates the float type throughout). Safe by construction: when vk == DOUBLE this is byte-identical to before (vk IS FFC_VALUE_KIND_DOUBLE), so the double path and the double-parsing fast/slow paths are unchanged; only vk == FLOAT changes, and only toward correctness. Validated: unit + 4M-value supplemental tests pass; a 20M-value ffc-vs-strtof parity sweep over high-digit-count (digit_comp-forcing) inputs is bit-identical. Found via review of redis/hiredis#1328 (Cursor Bugbot). * review: address PR feedback — dedup always_inline macro, docs, comment cleanups - Hoist the ffc_inline three-way ladder into api.h above FFC_IMPL_INLINE and collapse FFC_IMPL_INLINE to `#define FFC_IMPL_INLINE ffc_inline`; guard the duplicate definition in common.h with `#ifndef ffc_inline`. The two macros were byte-identical. - readme: add a Configuration Macros section documenting FFC_IMPL and FFC_ROUNDS_TO_NEAREST (with the don't-define-if-you-change-rounding caveat). - common.h: apply the suggested wording for the ffc_rounds_to_nearest() comment. - parse.h: rewrite the integer-scan, fraction-unroll, hoisted-locals and too_many_digits comments to describe the current code rather than the diff that introduced them; tighten the AArch64/Clang acc10 inline-asm comment so the single-axis gating is self-explanatory.

cursor Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread read.c Outdated

Comment thread ffc.h Outdated

cursor Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread test.c Outdated

michael-grunder approved these changes Jun 2, 2026

View reviewed changes

michael-grunder merged commit 86f5cc0 into redis:master Jun 2, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use ffc (pure-C99) as the RESP3 double parser instead of strtod#1328

Use ffc (pure-C99) as the RESP3 double parser instead of strtod#1328
michael-grunder merged 3 commits into
redis:masterfrom
fcostaoliveira:ffc-double-parser

fcostaoliveira commented Jun 2, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

michael-grunder left a comment

Uh oh!

fcostaoliveira commented Jun 2, 2026

Uh oh!

kolemannix commented Jun 2, 2026

Uh oh!

michael-grunder commented Jun 2, 2026

Uh oh!

Uh oh!

kolemannix commented Jun 2, 2026

Uh oh!

michael-grunder commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

fcostaoliveira commented Jun 2, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Locale correctness (a latent bug)

2. Speed

Correctness

Notes

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

michael-grunder left a comment

Choose a reason for hiding this comment

Uh oh!

fcostaoliveira commented Jun 2, 2026

Uh oh!

kolemannix commented Jun 2, 2026

Uh oh!

michael-grunder commented Jun 2, 2026

Uh oh!

Uh oh!

kolemannix commented Jun 2, 2026

Uh oh!

michael-grunder commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fcostaoliveira commented Jun 2, 2026 •

edited by cursor Bot

Loading