Skip to content

Conversation

@gabotechs
Copy link
Contributor

@gabotechs gabotechs commented Oct 20, 2025

Which issue does this PR close?

  • No issue.

Rationale for this change

Users might want to create their own PhysicalOptimizerRule implementations. Users might have their own SessionConfig.extensions. Users might want to use their own SessionConfig.extensions in their custom PhysicalOptimizerRule implementations.

This PR is needed for gabotechs#7, but it was factored out as it's an isolated change that could have value on its own.

What changes are included in this PR?

Changes signature of PhysicalOptimizerRule.optimze to take a SessionConfig rather than a ConfigOptions.

pub trait PhysicalOptimizerRule: Debug {
    /// Rewrite `plan` to an optimized form
    fn optimize(
        &self,
        plan: Arc<dyn ExecutionPlan>,
-       config: &ConfigOptions,
+       config: &SessionConfig,
    ) -> Result<Arc<dyn ExecutionPlan>>;
    ...
}

Are these changes tested?

yes, by current tests.

Are there any user-facing changes?

yes, PhysicalOptimizerRule.optimze now takes a SessionConfig rather than a ConfigOptions, which is a breaking non backwards compatible change.

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Oct 20, 2025
@gabotechs gabotechs force-pushed the change-physical-optimizer-rule-signature branch 2 times, most recently from 75e9a56 to 3a445e3 Compare October 20, 2025 10:06
@gabotechs gabotechs force-pushed the change-physical-optimizer-rule-signature branch from 3a445e3 to f722d32 Compare October 20, 2025 10:40
Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a reasonable change to me, but I think we should wait until we are certain we are going to need it for #18172 before we merge it to avoid breaking the API without good justification

@adriangb
Copy link
Contributor

@gabotechs sorry for not coming back to this. I’d like to help get it across the line. Are you okay if I resolve conflicts and push?

@adriangb
Copy link
Contributor

My only feedback for review is if we should add a new argument with Extensions or make the argument OptimizerContext so that we can make future changes without it being breaking

@gabotechs
Copy link
Contributor Author

@gabotechs sorry for not coming back to this. I’d like to help get it across the line. Are you okay if I resolve conflicts and push?

Sure!

@gabotechs
Copy link
Contributor Author

My only feedback for review is if we should add a new argument with Extensions or make the argument OptimizerContext so that we can make future changes without it being breaking

I'm trying to look for inspiration in the docs for this, and I found:
https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html#relationship-between-sessioncontext-sessionstate-and-taskcontext

Relationship between SessionContext, SessionState, and TaskContext

The state required to optimize, and evaluate queries is broken into three levels to allow tailoring

The objects are:

SessionContext: Most users should use a SessionContext. It contains all information required to execute queries including high level APIs such as SessionContext::sql. All queries run with the same SessionContext share the same configuration and resources (e.g. memory limits).

SessionState: contains information required to plan and execute an individual query (e.g. creating a LogicalPlan or ExecutionPlan). Each query is planned and executed using its own SessionState, which can be created with SessionContext::state. SessionState allows finer grained control over query execution, for example disallowing DDL operations such as CREATE TABLE.

TaskContext contains the state required for query execution (e.g. ExecutionPlan::execute). It contains a subset of information in SessionState. TaskContext allows executing ExecutionPlans PhysicalExprs without requiring a full SessionState.

Following that same pattern, for the same reason that a TaskContext exists for execution, one could argue that an equivalent OptimizerContext struct for optimization steps could make sense.

WDYT?

@gabotechs
Copy link
Contributor Author

Also, found another different reason for wanting this:

https://github.com/datafusion-contrib/datafusion-distributed/blob/main/src/distributed_planner/distributed_config.rs#L87-L94

@adriangb
Copy link
Contributor

Following that same pattern, for the same reason that a TaskContext exists for execution, one could argue that an equivalent OptimizerContext struct for optimization steps could make sense.

Yep agreed! And we can even add a new method to the trait optimize_plan or something that accepts an OptimizerContext and delegates to the existing method by default to ease the transition / make it a non breaking change for now. It will still be tricky to deprecate the original method, there is no deprecation of trait methods…

@adriangb
Copy link
Contributor

Something like #18739

@gabotechs
Copy link
Contributor Author

Closing in favor of #18739

@gabotechs gabotechs closed this Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate datasource Changes to the datasource crate optimizer Optimizer rules physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants