-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Datadog is working on building a distributed version of DataFusion, which requires query serialization and deserialization. While testing with TPC-H queries, we found that Q16 fails during deserialization—potentially due to an issue in the serialization step. I've minimized the query to a smaller form that still reproduces the problem.
SELECT p_size FROM part WHERE p_size IN (14, 6, 5, 31)To Reproduce
See this PR for the reproducer
Expected behavior
The deserialization of the query should work
Additional context
- You can only reproduce this on actual TPC-H data. See the comments in the repro for the details
- You won't hit the bug if the number of items in the list is 3 or fewer. E.g. `(14, 6, 5)
- The same bug still happens if you replace
SELECT p_sizewithSELECT p_brandbut the mismatch data type in the erorr message is now different. It looks to me the data type of the list(14, 6, 5, 31)was wrongly read from some schema during serialization/deserialization and that schema depends on the query and the parquet file
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working