[Base] Replace vendored printf with a better implementation.#23761
Draft
[Base] Replace vendored printf with a better implementation.#23761
Conversation
f9cb70c to
537e000
Compare
Replace the eyalroz/printf third-party library with a from-scratch implementation covering exactly the specifiers IREE uses. The vendored library supported features we don't need (wide chars, %n, locale) while having precision issues at float rounding boundaries we couldn't fix upstream. Also, several buffer overflow vulnerabilities. The new implementation is ~1700 lines of C with FMA-based error correction for precise rounding at decimal boundaries. Two-step multiplication with exact 10^22 factors handles the full double range, and fmod-based remainder computation provides IEEE 754-exact rounding for the division path. Includes a 693-line gtest suite and a 785-line structured fuzz target with 6 strategies (bounded strings, integer/float correctness vs libc, width/precision stress, full format generation, malformed formats). Fuzz-tested with ASAN: 32 parallel jobs, 3455 corpus files, 6 crash classes found and fixed, zero remaining crashes. Coverage: 99.2% line, 97.1% branch, 100% function on printf.c. Cross-validated security review (Codex + Gemini) found one additional issue (size_t→int return overflow guard), which is included in this change. Co-Authored-By: Claude <[email protected]>
Add source-based code coverage reporting to iree-bazel-test using clang's -fprofile-instr-generate/-fcoverage-mapping instrumentation and llvm-cov for reporting. Three composable flags: - --coverage: build with instrumentation, run the test binary, and report line/branch/function coverage via llvm-cov - --coverage-fuzz: additionally find a sibling _fuzz target (e.g., printf_test -> printf_fuzz), run it over its fuzzer corpus with -runs=0, and merge profiles for combined coverage - --coverage-html=DIR: generate a browseable HTML report Source file discovery uses naming convention (foo_test -> foo.c) with the /proc/self/cwd/ prefix that Bazel embeds in coverage mapping debug info. Sanitizer configs are automatically stripped in coverage mode since they distort code generation and attribution. Co-Authored-By: Claude <[email protected]>
537e000 to
7d21da2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I took a deeper look at the printf library I chose the other day and was not happy.
Besides not having buffer overflows, this implementation now matches libc's float behavior (and it'll be consistent across all platforms/CRTs).
Replaces the vendored eyalroz/printf library with a focused, from-scratch implementation covering exactly the specifiers IREE uses. Every line is ours, every code path is tested, and every bounds check precedes its dereference.
The vendored library was a maintenance burden: it supported features IREE doesn't need (wide chars,
%n, locale-dependent formatting), while its float formatting had precision issues at rounding boundaries that we couldn't easilyfix upstream. Also, several buffer overflow vulnerabilities. The new implementation is ~1700 lines of C with no external dependencies beyond
<math.h>and the C standard library.What changed
New implementation (
runtime/src/iree/base/printf.c,printf.h):%d/%i/%u/%o/%x/%Xwith all length modifiers (hh/h/l/ll/z/t/j), all flags (-/+/space/0/#), and dynamic width/precision via*.%f/%e/%g/%E/%G/%Fwith FMA-based error correction for precise rounding at decimal boundaries. Two-step multiplication with exact10^22factors handles the full double range without losing precision in the rounding decision.fmod-based remainder computation provides IEEE 754-exact rounding for the division path.%grouting uses the post-rounding exponent per the C standard, correctly handling boundary cases like%.1gof9.5→1e+01. String/char/pointer:%swith precision-bounded reads,%c,%p.%ndeliberately excluded (write-what-where primitive).size_t→intreturn value guarded against overflow (returns -1 if formatted output exceedsINT_MAXbytes).Removed vendored library:
third_party/printfsubmodule and all build system references.Testing (
printf_test.cc,printf_fuzz.cc):%.*ssafety)The implementation was fuzz-tested with ASAN under
--config=fuzzerusingiree-bazel-fuzzwith 32 parallel jobs. Multiple rounds of fuzzing (30 minutes each) with corpus minimization found and fixed several issues and ensured conformance with libc.Final fuzzing pass: 30 minutes, 32 jobs, 3455 corpus files, zero crashes.
Coverage: 99.2% line, 97.1% branch, 100% function on printf.c.
Test plan
iree-bazel-test --config=asan //runtime/src/iree/base:printf_testiree-bazel-fuzz //runtime/src/iree/base:printf_fuzz -- -jobs=32 -max_total_time=1800(zero crashes)macos-cmake-build --target iree_base_base)windows-cmake-build --target iree_base_base)ci-extra: all