[Presto] Fix split allocation logic for partition pruning#741
Conversation
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: chenxu <[email protected]>
Signed-off-by: chenxu <[email protected]>
Signed-off-by: chenxu <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ec9f993c18
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } else { | ||
| break; | ||
| try { | ||
| Schema arrowSchema = Schema.fromJSON(tableInfo.getTableSchema()); |
There was a problem hiding this comment.
Preserve non-Arrow schema parsing in split generation
This path now assumes every table_schema is Arrow JSON by calling Schema.fromJSON(...) directly, but LakeSoul also stores Spark StructType JSON for some tables (the metadata path already branches on schema kind). For those tables, split planning will throw before any scan starts, so even simple SELECT queries fail at runtime. Use the same Arrow-vs-StructType handling here instead of unconditionally parsing as Arrow.
Useful? React with 👍 / 👎.
| } | ||
|
|
||
| private static Set<String> extractRowsToSet(ResultSet rs) throws SQLException { | ||
| Set<String> rows = new HashSet<>(); |
There was a problem hiding this comment.
Keep duplicate-row cardinality in benchmark data checks
The verifier now compares Set<String> snapshots of rows, which removes duplicate multiplicity. That means two result sets with different duplicate counts can still be treated as equal (e.g., A,A,B vs A,B,B) as long as total row counts match, so CDC consistency checks can silently miss real mismatches. Represent rows as a multiset (row -> count) or use SQL EXCEPT ALL-style logic to preserve cardinality.
Useful? React with 👍 / 👎.
Fix split allocation logic for partition pruning #740