Skip to content

Conversation

@berkaysynnada
Copy link
Contributor

@berkaysynnada berkaysynnada commented Dec 25, 2023

Which issue does this PR close?

The continuation of the work initiated in #8395. It could also be beneficial for #7942.

Rationale for this change

TreeNode implementations of some optimizer rules are challenging to understand and are open to misuse. This refactor standardizes the implementations and eliminates unnecessary payloads.

What changes are included in this PR?

These implementations have been refactored:

  1. DistributionContext
  2. PlanWithCorrespondingSort
  3. PlanWithCorrespondingCoalescePartitions
  4. PipelineStatePropagator
  5. OrderPreservationContext
  6. SortPushDown
  7. ExprOrdering
  8. PlanWithKeyRequirements

map_children() functions of these implementations are now uniform. Previously, some of the rules were in map_children(), others were in some utils such as new_from_children(), and some were in transformer rules. This distributed structure made understanding and maintenance difficult. All these rules have now been moved into functions used as transform arguments on the optimizer part.

Since Datafusion trees generally consist of nodes that store their children, each transform can implicitly have bottom-up transform capability. In some uses of transform_up(), after updating the children of the self node, additional logic is added during their attachment to the self node. This practice has been avoided. Now, all map_children() does is attach the updated children to the default-created self node without any modification of the self node.

A similar situation may cause an algorithm that is expected to performtransform_down() to perform an implicit transform_up() if the map_children() implementation of the rule that perform transform_down() includes some logic. Perhaps a more comprehensive tree visitor-transformer design can be planned to address this issue.

Are these changes tested?

Yes, with existing tests.

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates core Core DataFusion crate labels Dec 25, 2023
Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this carefully and it looks great to me. Now all the recursive structures are crystal clear and it should inform the overall TreeNode refactor/redesign effort nicely.

@alamb
Copy link
Contributor

alamb commented Dec 26, 2023

cc @peter-toth @sadboy @Dandandan @avantgardnerio

@alamb
Copy link
Contributor

alamb commented Dec 26, 2023

I hope to review this PR later today but may not get to it for a day or two

@ozankabak
Copy link
Contributor

For ease of review, I suggest looking at struct definitions of tree nodes and the transform functions. All map_children implementations are the same. Hopefully, this will help us design a framework where one doesn't need to define map_children and apply_children over and over again and just provide the transformation logic and a mechanism to access children nodes.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spot checked this code and verified that all existing tests pass. Thus while I don't fully understand the nuances involved, from my point of view it is a nice step forward

Thank you for the contribution @berkaysynnada and @ozankabak (and @peter-toth who seems to have spurred the current set of improvements)

I can't wait to see what things look like after a few more rounds of improvement ❤️

FYI @jackwener

@peter-toth
Copy link
Contributor

peter-toth commented Dec 27, 2023

I checked SortPushDown, ExprOrdering and PlanWithKeyRequirements when opened #8664 on the top of this PR.

@ozankabak
Copy link
Contributor

Seems like everybody is on board and follow-up work is already getting underway, so I will go ahead and merge this in a few hours

@ozankabak ozankabak merged commit 6403222 into apache:main Dec 27, 2023
appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Jan 4, 2024
* Refactor TreeNode's

* Update utils.rs

* Final review

* Remove unnecessary clones, more idiomatic Rust

---------

Co-authored-by: Mehmet Ozan Kabak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants