-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[Variant] Fuzz testing and benchmarks for vaildation #7849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
friendlymatthew
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, really like the new tests for the malformed variants.
For the fuzzing though, I'm wondering if we considered using something more robust like AFL. Afl does coverage guided fuzzing which should be more effective at finding weird edge cases than just flipping bits at random, because it can use code coverage to determine which paths to take when mutating test cases. It also gets us test minimization for free
I recommend we file another ticket to look into a more robust / automated framework for fuzzing. It applies to more than just variant and would make a nice addition to this crate I think (though I also think it will be a non trivial thing to setup, but I haven't actually tried it) |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @carpecodeum -- I think this is a very nice test addition. I left some comments but I also think we could merge it in as is
I also verified this covers many of the edge cases using llvm-cov
cargo llvm-cov --html -p parquet-variant --tests -- test_validation_fuzz_integrationAnd then I reviewed the report (attached: report.zip)
And found that many more error paths are covered 👍
| } | ||
|
|
||
| // Benchmark validation performance | ||
| fn bench_validation_validated_vs_unvalidated(c: &mut Criterion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a cool idea -- not related to fuzz testing I don't think but a nice addition anyways
| let corrupt_value = rng.random_bool(0.7); | ||
|
|
||
| if corrupt_metadata && !metadata.is_empty() { | ||
| let num_corruptions = rng.random_range(1..=(metadata.len().min(5))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if we need to corrupt it with more than one corruption 🤔
If you put this many corruptions I think it means you'll basically correct the entire thing for small variants (which might be ok
Maybe it should be max(3) instead to do up to three corruptions 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed to exactly 1 corruption is that fine?
I do agree! I initially planned to include something like afl, and thought maybe I could extend that functionality in this PR with more direction from you and @alamb, but Andrew states a very good point that we should probably make a new ticket and extend that to the crate. If you guys want, I can make a new PR making that setup |
|
Aside: The coverage report exposed the fact that |
I agree |
|
Thanks @carpecodeum |
I think it would be worth a ticket and a quick proof of concept I don't have any experience with ths AFL framework and I don't know how well suited it is for runing in CI normally |
Which issue does this PR close?
Closes #7842 (Add testing for invalid variants)
Rationale for this change
After adding support for both fallible and infallible access to variants, @alamb pointed out that there aren't many tests for the validation system itself.
CC - @scovich @friendlymatthew
What changes are included in this PR?
This change adds the fuzzing @alamb requested: it generates valid variants using the builder, randomly corrupts them by flipping bits, then tests both validation paths (if validation passes, make sure access doesn't crash; if it fails, make sure error handling works properly) across many corruption scenarios plus specific malformed test cases.
A huge thank you to @PinkCrow007, @mprammer, @alamb, and the rest of the CMU variant team for their continued support towards this project.
Are these changes tested?
Yes, passing all the tests currently
We typically require tests for all PRs in order to:
If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
Are there any user-facing changes?
No, tests are to make sure the validation system works fine