Skip to content

Conversation

@nils-braun
Copy link
Collaborator

The current OVER implementation works as follows:
If there is are OVER calls in a SELECT (aka a projection), they are treated one after the other by first storing the current index, partitioning and sorting, then grouping and applying the window and finally restoring the current sorting/partitioning/sorting. The reason for this is, that if we mix OVER and non-OVER calls in a single project, they will not fit together if they have different partitioning/sorting/index.
This PR introduces a much easier implementation: using one of the optimization rules in calcite, we split up projections into projections without OVER and ones that only contain window operations (which are now called LogicalWindow). This allows us to group and shuffle without the need to restore any ordering or partitioning, as the window is now treated independently.

I did not do any benchmarking so far (will do later), but I would expect this to be much faster for larger data sets and many OVER operations.

@nils-braun nils-braun merged commit 7b60e4a into main Aug 18, 2021
@nils-braun nils-braun deleted the feature/more-efficient-window-implementation branch August 18, 2021 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants