Skip to content

[Presto] Fix split allocation logic for partition pruning#741

Merged
xuchen-plus merged 24 commits into
lakesoul-io:mainfrom
baiqiubai:temp
Apr 29, 2026
Merged

[Presto] Fix split allocation logic for partition pruning#741
xuchen-plus merged 24 commits into
lakesoul-io:mainfrom
baiqiubai:temp

Conversation

@baiqiubai
Copy link
Copy Markdown
Contributor

Fix split allocation logic for partition pruning #740

Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
Signed-off-by: baiqiubai <[email protected]>
@mag1c1an1
Copy link
Copy Markdown
Contributor

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ec9f993c18

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

} else {
break;
try {
Schema arrowSchema = Schema.fromJSON(tableInfo.getTableSchema());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve non-Arrow schema parsing in split generation

This path now assumes every table_schema is Arrow JSON by calling Schema.fromJSON(...) directly, but LakeSoul also stores Spark StructType JSON for some tables (the metadata path already branches on schema kind). For those tables, split planning will throw before any scan starts, so even simple SELECT queries fail at runtime. Use the same Arrow-vs-StructType handling here instead of unconditionally parsing as Arrow.

Useful? React with 👍 / 👎.

}

private static Set<String> extractRowsToSet(ResultSet rs) throws SQLException {
Set<String> rows = new HashSet<>();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep duplicate-row cardinality in benchmark data checks

The verifier now compares Set<String> snapshots of rows, which removes duplicate multiplicity. That means two result sets with different duplicate counts can still be treated as equal (e.g., A,A,B vs A,B,B) as long as total row counts match, so CDC consistency checks can silently miss real mismatches. Represent rows as a multiset (row -> count) or use SQL EXCEPT ALL-style logic to preserve cardinality.

Useful? React with 👍 / 👎.

@xuchen-plus xuchen-plus merged commit 50d2801 into lakesoul-io:main Apr 29, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants