-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Is your feature request related to a problem or challenge?
I noticed an awesome SQL fuzzing framework SQLancer can be implemented on DataFusion, and it is able to detect many bugs even in PostgreSQL and SQLite
Update:
Implementation is now at datafusion-sqlancer
Supported SQL Features
JOINs,ORDER BY,WHERE- Numeric scalar functions/expression operators
- String scalar functions/expression operators
- Aggregate functions,
HAVINGclause - Time related data type functions
- Window functions
- Subquery
- Queries from parquet, csv
- Exploit different configurations (change config knobs like
target_partition,prefer_hash_joinetc.
Supported Test Oracles
Note: most oracles only apply to a subset of available query types, for advanced SQL features like window functions we can only generate random queries and report crashes.
More context for below test oracles at https://github.com/sqlancer/sqlancer/tree/main
- NoREC
- TLP
- PQS
- DQP for logical bugs in joins
- EET for logic bugs in joins and subqueries
How SQLancer works in short
- It's a black box fuzzer, which will be implemented on SQLancer's starter code, and connect to DataFusion using
JDBCto do SQL level testings - It will generate random chaotic SQL queries to stress the system, and make sure it won't crash
- And do extra logical consistency checks using randomly generated SQLs,
SQLancerhas 5 logic check oracles, one of them works like:
NoREC consistency check oracle
Randomly generated query(Q1):
select * from t1 where v1 > 0;
Mutated query(Q2):
select v1 > 0 from t1;
Consistency check:
result size of Q1 should be equal to the number of `True` in Q2's output
Above showed consistency check generated Q1 (very likely to be optimized by predicate pushdown), and Q2(hard to be optimized), such test suit focus on correctness of the optimizer. There are 5 similar test oracles available to be implemented, those carefully designed checks make this testing framework really powerful.
Describe the solution you'd like
I plan to implement SQLancer on DataFusion(starting with a specific test oralcle NoREC which requires less engineering effort).
For now, a minimal subset of SQL features is implemented: it hasn't detected any logical bug yet, just 2 bad-input bugs for some scalar functions showed up
(Will share the code once it is cleaned up)
If you have any features (SQL clauses / data types / specific functions) would like to be further tested, I can implement them first :)
Describe alternatives you've considered
SQLsmith looks like another popular choice, I haven't looked into it carefully yet.
But if it's only generating random SQL to test if the system will crash, then SQLancer should be a more comprehensive tool.
Additional context
SQLancer's page have several papers/YouTube talk video recordings available