[Variant] Fuzz testing and benchmarks for vaildation #7849

carpecodeum · 2025-07-02T20:20:27Z

Which issue does this PR close?

Closes #7842 (Add testing for invalid variants)

Rationale for this change

After adding support for both fallible and infallible access to variants, @alamb pointed out that there aren't many tests for the validation system itself.
CC - @scovich @friendlymatthew

What changes are included in this PR?

This change adds the fuzzing @alamb requested: it generates valid variants using the builder, randomly corrupts them by flipping bits, then tests both validation paths (if validation passes, make sure access doesn't crash; if it fails, make sure error handling works properly) across many corruption scenarios plus specific malformed test cases.

A huge thank you to @PinkCrow007, @mprammer, @alamb, and the rest of the CMU variant team for their continued support towards this project.

Are these changes tested?

Yes, passing all the tests currently

We typically require tests for all PRs in order to:

Prevent the code from being accidentally broken by subsequent changes
Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?

Are there any user-facing changes?

No, tests are to make sure the validation system works fine

friendlymatthew

Hi, really like the new tests for the malformed variants.

For the fuzzing though, I'm wondering if we considered using something more robust like AFL. Afl does coverage guided fuzzing which should be more effective at finding weird edge cases than just flipping bits at random, because it can use code coverage to determine which paths to take when mutating test cases. It also gets us test minimization for free

alamb · 2025-07-03T19:21:04Z

For the fuzzing though, I'm wondering if we considered using something more robust like AFL. Afl does coverage guided fuzzing which should be more effective at finding weird edge cases than just flipping bits at random, because it can use code coverage to determine which paths to take when mutating test cases. It also gets us test minimization for free

I recommend we file another ticket to look into a more robust / automated framework for fuzzing. It applies to more than just variant and would make a nice addition to this crate I think (though I also think it will be a non trivial thing to setup, but I haven't actually tried it)

alamb

Thank you @carpecodeum -- I think this is a very nice test addition. I left some comments but I also think we could merge it in as is

I also verified this covers many of the edge cases using llvm-cov

 cargo llvm-cov --html -p parquet-variant --tests -- test_validation_fuzz_integration

And then I reviewed the report (attached: report.zip)

And found that many more error paths are covered 👍

alamb · 2025-07-03T19:21:50Z

parquet-variant/benches/variant_builder.rs

 }

+// Benchmark validation performance
+fn bench_validation_validated_vs_unvalidated(c: &mut Criterion) {


This is a cool idea -- not related to fuzz testing I don't think but a nice addition anyways

alamb · 2025-07-03T19:30:09Z

parquet-variant/tests/variant_interop.rs

+    let corrupt_value = rng.random_bool(0.7);
+
+    if corrupt_metadata && !metadata.is_empty() {
+        let num_corruptions = rng.random_range(1..=(metadata.len().min(5)));


i wonder if we need to corrupt it with more than one corruption 🤔

If you put this many corruptions I think it means you'll basically correct the entire thing for small variants (which might be ok

Maybe it should be max(3) instead to do up to three corruptions 🤔

I changed to exactly 1 corruption is that fine?

carpecodeum · 2025-07-04T02:30:58Z

Hi, really like the new tests for the malformed variants.

For the fuzzing though, I'm wondering if we considered using something more robust like AFL. Afl does coverage guided fuzzing which should be more effective at finding weird edge cases than just flipping bits at random, because it can use code coverage to determine which paths to take when mutating test cases. It also gets us test minimization for free

I do agree! I initially planned to include something like afl, and thought maybe I could extend that functionality in this PR with more direction from you and @alamb, but Andrew states a very good point that we should probably make a new ticket and extend that to the crate. If you guys want, I can make a new PR making that setup

scovich · 2025-07-04T15:06:33Z

Aside: The coverage report exposed the fact that map_try_from_slice_error is dead code. Maybe we should delete it?

alamb · 2025-07-04T19:06:58Z

Aside: The coverage report exposed the fact that map_try_from_slice_error is dead code. Maybe we should delete it?

I agree

[Variant] Remove dead code, add comments #7861

alamb · 2025-07-05T11:16:16Z

Thanks @carpecodeum

alamb · 2025-07-05T11:17:08Z

but Andrew states a very good point that we should probably make a new ticket and extend that to the crate. If you guys want, I can make a new PR making that setup

I think it would be worth a ticket and a quick proof of concept

I don't have any experience with ths AFL framework and I don't know how well suited it is for runing in CI normally

carpecodeum added 2 commits July 2, 2025 08:58

[TESTS] add fuzz tests to check if the variants are valid

4b0257a

[FIX] fix fmt tests

98b9da8

github-actions bot added the parquet Changes to the parquet crate label Jul 2, 2025

carpecodeum added 2 commits July 2, 2025 16:45

[FIX] fix clippy errors

3ea0c3c

[FIX] fix clippy errors

dcc5e39

friendlymatthew mentioned this pull request Jul 3, 2025

[Variant] Support creating sorted dictionaries #7833

Merged

friendlymatthew reviewed Jul 3, 2025

View reviewed changes

alamb changed the title ~~[Variant] Add testing for invalid variants - Fuzz testing~~ [Variant] Fuzz testing and benchmarks for vaildation Jul 3, 2025

alamb approved these changes Jul 3, 2025

View reviewed changes

[FIX] always do exactly 1 corruption

a695602

alamb merged commit 54e4734 into apache:main Jul 5, 2025
12 checks passed

This was referenced Aug 21, 2025

[Variant] Improve fuzz test for Variant #8198

Closed

[Variant] Improve fuzz test for Variant #8199

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Variant] Fuzz testing and benchmarks for vaildation #7849

[Variant] Fuzz testing and benchmarks for vaildation #7849

Uh oh!

carpecodeum commented Jul 2, 2025

Uh oh!

friendlymatthew left a comment

Uh oh!

alamb commented Jul 3, 2025

Uh oh!

alamb left a comment

Uh oh!

alamb Jul 3, 2025

Uh oh!

alamb Jul 3, 2025

Uh oh!

carpecodeum Jul 4, 2025

Uh oh!

carpecodeum commented Jul 4, 2025

Uh oh!

scovich commented Jul 4, 2025

Uh oh!

alamb commented Jul 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

alamb commented Jul 5, 2025

Uh oh!

alamb commented Jul 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Variant] Fuzz testing and benchmarks for vaildation #7849

[Variant] Fuzz testing and benchmarks for vaildation #7849

Uh oh!

Conversation

carpecodeum commented Jul 2, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

friendlymatthew left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Jul 3, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

carpecodeum Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

carpecodeum commented Jul 4, 2025

Uh oh!

scovich commented Jul 4, 2025

Uh oh!

alamb commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

alamb commented Jul 5, 2025

Uh oh!

alamb commented Jul 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alamb commented Jul 4, 2025 •

edited

Loading