Skip to content

[Base] Replace vendored printf with a better implementation.#23761

Draft
benvanik wants to merge 2 commits intomainfrom
users/benvanik/printf
Draft

[Base] Replace vendored printf with a better implementation.#23761
benvanik wants to merge 2 commits intomainfrom
users/benvanik/printf

Conversation

@benvanik
Copy link
Collaborator

@benvanik benvanik commented Mar 12, 2026

I took a deeper look at the printf library I chose the other day and was not happy.
Besides not having buffer overflows, this implementation now matches libc's float behavior (and it'll be consistent across all platforms/CRTs).


Replaces the vendored eyalroz/printf library with a focused, from-scratch implementation covering exactly the specifiers IREE uses. Every line is ours, every code path is tested, and every bounds check precedes its dereference.

The vendored library was a maintenance burden: it supported features IREE doesn't need (wide chars, %n, locale-dependent formatting), while its float formatting had precision issues at rounding boundaries that we couldn't easily
fix upstream. Also, several buffer overflow vulnerabilities. The new implementation is ~1700 lines of C with no external dependencies beyond <math.h> and the C standard library.

What changed

New implementation (runtime/src/iree/base/printf.c, printf.h):

  • Full integer formatting: %d/%i/%u/%o/%x/%X with all length modifiers (hh/h/l/ll/z/t/j), all flags (-/+/space/0/#), and dynamic width/precision via *.
  • Float formatting: %f/%e/%g/%E/%G/%F with FMA-based error correction for precise rounding at decimal boundaries. Two-step multiplication with exact 10^22 factors handles the full double range without losing precision in the rounding decision. fmod-based remainder computation provides IEEE 754-exact rounding for the division path.
  • %g routing uses the post-rounding exponent per the C standard, correctly handling boundary cases like %.1g of 9.51e+01. String/char/pointer: %s with precision-bounded reads, %c, %p.
  • %n deliberately excluded (write-what-where primitive).
  • Dynamic width/precision clamped to 10000 at parse time to prevent integer overflow in downstream arithmetic.
  • size_tint return value guarded against overflow (returns -1 if formatted output exceeds INT_MAX bytes).

Removed vendored library:

  • Deleted third_party/printf submodule and all build system references.

Testing (printf_test.cc, printf_fuzz.cc):

  • gtest suite covering every specifier, flag combination, length modifier, edge case (INT_MIN, UINT64_MAX, DBL_MAX, subnormals, NaN, Inf, zero), and regression cases found during fuzzing.
  • Structured fuzz target with 6 strategies:
    1. Bounded string reads with ASAN (%.*s safety)
    2. Structured integer formatting vs libc
    3. Structured float formatting vs libc (range-gated)
    4. Width/precision stress testing vs libc
    5. Full format string generation from fuzzer bytes (parser state machine coverage)
    6. Malformed/truncated format strings (crash safety)

The implementation was fuzz-tested with ASAN under --config=fuzzer using iree-bazel-fuzz with 32 parallel jobs. Multiple rounds of fuzzing (30 minutes each) with corpus minimization found and fixed several issues and ensured conformance with libc.

Final fuzzing pass: 30 minutes, 32 jobs, 3455 corpus files, zero crashes.
Coverage: 99.2% line, 97.1% branch, 100% function on printf.c.

Test plan

  • iree-bazel-test --config=asan //runtime/src/iree/base:printf_test
  • iree-bazel-fuzz //runtime/src/iree/base:printf_fuzz -- -jobs=32 -max_total_time=1800 (zero crashes)
  • macOS CMake build (macos-cmake-build --target iree_base_base)
  • Windows CMake build (windows-cmake-build --target iree_base_base)
  • 4 cross-validated security reviews (Codex, Geminis, Grok, DeepSeek): 1 finding fixed (INT_MAX return guard), remainder assessed as low-risk design tradeoffs

ci-extra: all

@benvanik benvanik added runtime Relating to the IREE runtime library post-merge-review Ben's special place. People can pick these up and review them for forward fixes if interested. labels Mar 12, 2026
@benvanik benvanik force-pushed the users/benvanik/printf branch 2 times, most recently from f9cb70c to 537e000 Compare March 13, 2026 00:24
benvanik and others added 2 commits March 12, 2026 22:41
Replace the eyalroz/printf third-party library with a from-scratch
implementation covering exactly the specifiers IREE uses. The vendored
library supported features we don't need (wide chars, %n, locale) while
having precision issues at float rounding boundaries we couldn't fix
upstream. Also, several buffer overflow vulnerabilities.

The new implementation is ~1700 lines of C with FMA-based error
correction for precise rounding at decimal boundaries. Two-step
multiplication with exact 10^22 factors handles the full double range,
and fmod-based remainder computation provides IEEE 754-exact rounding
for the division path.

Includes a 693-line gtest suite and a 785-line structured fuzz target
with 6 strategies (bounded strings, integer/float correctness vs libc,
width/precision stress, full format generation, malformed formats).
Fuzz-tested with ASAN: 32 parallel jobs, 3455 corpus files, 6 crash
classes found and fixed, zero remaining crashes.

Coverage: 99.2% line, 97.1% branch, 100% function on printf.c.

Cross-validated security review (Codex + Gemini) found one additional
issue (size_t→int return overflow guard), which is included in this
change.

Co-Authored-By: Claude <[email protected]>
Add source-based code coverage reporting to iree-bazel-test using
clang's -fprofile-instr-generate/-fcoverage-mapping instrumentation
and llvm-cov for reporting.

Three composable flags:
- --coverage: build with instrumentation, run the test binary, and
  report line/branch/function coverage via llvm-cov
- --coverage-fuzz: additionally find a sibling _fuzz target (e.g.,
  printf_test -> printf_fuzz), run it over its fuzzer corpus with
  -runs=0, and merge profiles for combined coverage
- --coverage-html=DIR: generate a browseable HTML report

Source file discovery uses naming convention (foo_test -> foo.c) with
the /proc/self/cwd/ prefix that Bazel embeds in coverage mapping debug
info. Sanitizer configs are automatically stripped in coverage mode
since they distort code generation and attribution.

Co-Authored-By: Claude <[email protected]>
@benvanik benvanik force-pushed the users/benvanik/printf branch from 537e000 to 7d21da2 Compare March 13, 2026 05:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

post-merge-review Ben's special place. People can pick these up and review them for forward fixes if interested. runtime Relating to the IREE runtime library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant