Skip to content

Conversation

@carpecodeum
Copy link
Contributor

Which issue does this PR close?

Closes #7842 (Add testing for invalid variants)

Rationale for this change

After adding support for both fallible and infallible access to variants, @alamb pointed out that there aren't many tests for the validation system itself.
CC - @scovich @friendlymatthew

What changes are included in this PR?

This change adds the fuzzing @alamb requested: it generates valid variants using the builder, randomly corrupts them by flipping bits, then tests both validation paths (if validation passes, make sure access doesn't crash; if it fails, make sure error handling works properly) across many corruption scenarios plus specific malformed test cases.

A huge thank you to @PinkCrow007, @mprammer, @alamb, and the rest of the CMU variant team for their continued support towards this project.

Are these changes tested?

Yes, passing all the tests currently

We typically require tests for all PRs in order to:

  1. Prevent the code from being accidentally broken by subsequent changes
  2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?

Are there any user-facing changes?

No, tests are to make sure the validation system works fine

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jul 2, 2025
Copy link
Contributor

@friendlymatthew friendlymatthew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, really like the new tests for the malformed variants.

For the fuzzing though, I'm wondering if we considered using something more robust like AFL. Afl does coverage guided fuzzing which should be more effective at finding weird edge cases than just flipping bits at random, because it can use code coverage to determine which paths to take when mutating test cases. It also gets us test minimization for free

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

For the fuzzing though, I'm wondering if we considered using something more robust like AFL. Afl does coverage guided fuzzing which should be more effective at finding weird edge cases than just flipping bits at random, because it can use code coverage to determine which paths to take when mutating test cases. It also gets us test minimization for free

I recommend we file another ticket to look into a more robust / automated framework for fuzzing. It applies to more than just variant and would make a nice addition to this crate I think (though I also think it will be a non trivial thing to setup, but I haven't actually tried it)

@alamb alamb changed the title [Variant] Add testing for invalid variants - Fuzz testing [Variant] Fuzz testing and benchmarks for vaildation Jul 3, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @carpecodeum -- I think this is a very nice test addition. I left some comments but I also think we could merge it in as is

I also verified this covers many of the edge cases using llvm-cov

 cargo llvm-cov --html -p parquet-variant --tests -- test_validation_fuzz_integration

And then I reviewed the report (attached: report.zip)

And found that many more error paths are covered 👍

}

// Benchmark validation performance
fn bench_validation_validated_vs_unvalidated(c: &mut Criterion) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cool idea -- not related to fuzz testing I don't think but a nice addition anyways

let corrupt_value = rng.random_bool(0.7);

if corrupt_metadata && !metadata.is_empty() {
let num_corruptions = rng.random_range(1..=(metadata.len().min(5)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if we need to corrupt it with more than one corruption 🤔

If you put this many corruptions I think it means you'll basically correct the entire thing for small variants (which might be ok

Maybe it should be max(3) instead to do up to three corruptions 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to exactly 1 corruption is that fine?

@carpecodeum
Copy link
Contributor Author

Hi, really like the new tests for the malformed variants.

For the fuzzing though, I'm wondering if we considered using something more robust like AFL. Afl does coverage guided fuzzing which should be more effective at finding weird edge cases than just flipping bits at random, because it can use code coverage to determine which paths to take when mutating test cases. It also gets us test minimization for free

I do agree! I initially planned to include something like afl, and thought maybe I could extend that functionality in this PR with more direction from you and @alamb, but Andrew states a very good point that we should probably make a new ticket and extend that to the crate. If you guys want, I can make a new PR making that setup

@scovich
Copy link
Contributor

scovich commented Jul 4, 2025

Aside: The coverage report exposed the fact that map_try_from_slice_error is dead code. Maybe we should delete it?

@alamb
Copy link
Contributor

alamb commented Jul 4, 2025

Aside: The coverage report exposed the fact that map_try_from_slice_error is dead code. Maybe we should delete it?

I agree

@alamb alamb merged commit 54e4734 into apache:main Jul 5, 2025
12 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 5, 2025

Thanks @carpecodeum

@alamb
Copy link
Contributor

alamb commented Jul 5, 2025

but Andrew states a very good point that we should probably make a new ticket and extend that to the crate. If you guys want, I can make a new PR making that setup

I think it would be worth a ticket and a quick proof of concept

I don't have any experience with ths AFL framework and I don't know how well suited it is for runing in CI normally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] Add testing for invalid variants (fuzz testing??)

4 participants