Skip to content

Conversation

@adriangb
Copy link
Contributor

No description provided.

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate execution Related to the execution crate labels Nov 15, 2025
@adriangb
Copy link
Contributor Author

@gabotechs do you think this will work for #18168?

Comment on lines 55 to 58
/// Config options
config_options: Arc<ConfigOptions>,
/// Extensions
extensions: Arc<Extensions>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just having here something like:

pub struct OptimizerContext {
    session_config: SessionConfig,
}

I imagine this struct to be equivalent to TaskContext but for optimization passes rather than execution, and that struct does have the session_config "as-is" as a field:

https://github.com/apache/datafusion/blob/main/datafusion/execution/src/task.rs#L42-L42

And this way, if in the future changes are made to the SessionConfig, we do not need to come back here and update also OptimizerContext

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be a private field, i.e. we expose ConfigOptions and Extensions as methods but not the full SessionConfig?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd deliberately copy what TaskContext is doing in order to keep both structure's as consistent as possible, that way the user experience of accessing an OptimizerContext is as close as possible as accessing TaskContext.

There, the field is private, but it can be accessed through a method:

https://github.com/apache/datafusion/blob/main/datafusion/execution/src/task.rs#L96-L100

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense sorry I should have just looked into TaskContext, ty for the code pointer

@gabotechs
Copy link
Contributor

@gabotechs do you think this will work for #18168?

yes! I think so

@adriangb
Copy link
Contributor Author

@gabotechs do you think this will work for #18168?

yes! I think so

Awesome I'll clean it up and we can try to get some reviews! The nice thing about this approach is that it's both backwards compatible now and will make future changes more backwards compatible as well.

adriangb and others added 3 commits November 22, 2025 16:51
…t into optimzer rules

In particular, give access to Extensions

Co-authored-by: Gabriel Musat Mestre <[email protected]>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new optimize_plan method to the PhysicalOptimizerRule trait that accepts an OptimizerContext parameter, allowing optimizer rules to access more context beyond just configuration options. The optimize method is deprecated in favor of the new approach.

Key Changes:

  • Added OptimizerContext struct that wraps ConfigOptions and Extensions
  • Created Extensions struct to manage typed extension data in SessionConfig
  • Deprecated the optimize method in favor of optimize_plan with backward compatibility
  • Updated all optimizer rule implementations to use the new optimize_plan method
  • Updated all test code to construct OptimizerContext from SessionConfig

Reviewed changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
datafusion/physical-optimizer/src/optimizer.rs Added OptimizerContext struct and new optimize_plan trait method with default implementation for backward compatibility
datafusion/execution/src/config.rs Added Extensions struct and updated SessionConfig to use Arc<Extensions> instead of AnyMap
datafusion/physical-optimizer/src/*.rs Updated all optimizer rules to implement optimize_plan instead of optimize
datafusion/core/tests/**/*.rs Updated test code to create OptimizerContext and call optimize_plan
datafusion/physical-optimizer/src/lib.rs Exported OptimizerContext publicly

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 95 to 96
/// provides access to the full optimizer context including session configuration,
/// runtime environment, and extensions.
Copy link

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation mentions 'runtime environment' but the OptimizerContext only provides access to ConfigOptions and Extensions. The runtime environment is not actually included in the context. Consider updating the documentation to accurately reflect what is provided: 'session configuration and extensions' without mentioning runtime environment.

Suggested change
/// provides access to the full optimizer context including session configuration,
/// runtime environment, and extensions.
/// provides access to the full optimizer context, including session configuration
/// and extensions.

Copilot uses AI. Check for mistakes.
}
}

/// `PhysicalOptimizerRule` transforms one ['ExecutionPlan'] into another which
Copy link

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent bracket style in documentation reference. Should use backticks with square brackets: ['ExecutionPlan'] should be changed to [ExecutionPlan] for proper rustdoc linking syntax.

Suggested change
/// `PhysicalOptimizerRule` transforms one ['ExecutionPlan'] into another which
/// `PhysicalOptimizerRule` transforms one [`ExecutionPlan`] into another which

Copilot uses AI. Check for mistakes.
@adriangb adriangb marked this pull request as ready for review November 22, 2025 09:14
@adriangb adriangb requested a review from gabotechs November 22, 2025 09:14
Copy link
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 +1 from me

@adriangb
Copy link
Contributor Author

Thanks @gabotechs ! I'll leave this open for a day or two to allow other feedback.

Copy link
Contributor

@Samyak2 Samyak2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this!

Left a minor comment, looks good otherwise.

Comment on lines 70 to 82
/// Return a reference to the configuration options
///
/// This is a convenience method that returns the [`ConfigOptions`]
/// from the [`SessionConfig`].
pub fn options(&self) -> &Arc<ConfigOptions> {
self.session_config.options()
}

/// Return a reference to the extensions
pub fn extensions(&self) -> &Arc<Extensions> {
self.session_config.extensions()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two methods are kind of redundant after the addition of session_config() method. Might be a nice QoL addition, but we should add the same in TaskContext to keep the API consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay will update to match.

Personally I would have kept the SessionConfig hidden until someone asks for it to minimize the dependency on such a large object. I.e. I'm not sure matching TaskContext 1:1 is the best approach here. But it seems that's just me 😄.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense actually. I only need extensions from the session config. We could expose just the required fields from the inner session_config. Unless, of course, there's already a requirement for the session config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabotechs wdyt? I guess the question is:

  • Do we want to be 1:1 with TaskContext?
    or
  • Do we follow the principle of exposing the smallest API surface area?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would have kept the SessionConfig hidden until someone asks for it to minimize the dependency on such a large object

✋ asking here, I specifically could use both session_config.extensions and session_config.options.extensions, so just having something like what we have in TaskContext is enough:

    /// Return the SessionConfig associated with this [OptimizerContext]
    pub fn session_config(&self) -> &SessionConfig {
        &self.session_config
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK will keep it as is for now and remove the extra methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb agreed that we should just expose SessionConfig

Comment on lines +116 to +117
#[derive(Clone, Debug)]
pub struct Extensions {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs docstrings!

@adriangb
Copy link
Contributor Author

adriangb commented Dec 1, 2025

Filed a ticket to trach

* [Add upgrade guide for PhysicalOptimizerRule::optimize_plan #19029](https://github.com/apache/datafusion/issues/19029)

I opened #19030

@milenkovicm
Copy link
Contributor

Actually, I may have few follow up questions, as PR does not have any associated issue nor it has any description or motivation explanation, which I don't think is especially good practice for changing one of the core interfaces:

  • Why have we introduced OptimizerContext when it only wraps SessionConfig, what else do we expect to have in OptimizerContext?
  • If changing interface could we just change method parameter and use SessionConfig instead of ConfigOptions, instead of adding new method and deprecating? Would it be cleaner approach?
  • Whats the difference between current and previous Extensions ?

@adriangb
Copy link
Contributor Author

adriangb commented Dec 1, 2025

  • Why have we introduced OptimizerContext when it only wraps SessionConfig, what else do we expect to have in OptimizerContext?

A pattern we've seen time and time again in this project is to assume that we won't want to make any changes in the future and then realizing we do. Having a specialized struct lets us do that with very little overhead.

  • If changing interface could we just change method parameter and use SessionConfig instead of ConfigOptions, instead of adding new method and deprecating? Would it be cleaner approach?

We considered doing this but decided to go this route instead because of (1) the future extensibility mentioned above and (2) changing the signature of the current method would be an immediate breaking change that makes upgrades difficult (there is no deprecation period), so you end up wanting to add a new method, if you're adding a new method you might as well make it extensible in the future.

Whats the difference between current and previous Extensions ?

Nothing really. Before there was no Extensions, you got/set them against SessionConfig directly and it stored a HashMap. I created a wrapper around that HashMap to have a named thing to hand out a reference for and establish a controlled API surface area.

@milenkovicm
Copy link
Contributor

  • Why have we introduced OptimizerContext when it only wraps SessionConfig, what else do we expect to have in OptimizerContext?

A pattern we've seen time and time again in this project is to assume that we won't want to make any changes in the future and then realizing we do. Having a specialized struct lets us do that with very little overhead.

Right, we might need it eventually so we’ve decided to make it as wide as possible to allow for a change that might never be necessary.

Nothing really. Before there was no Extensions, you got/set them against SessionConfig directly and it stored a HashMap. I created a wrapper around that HashMap to have a named thing to hand out a reference for and establish a controlled API surface area.

Ok, so we added new struct to the public API which wasn’t really necessary.

@adriangb
Copy link
Contributor Author

adriangb commented Dec 2, 2025

  • Why have we introduced OptimizerContext when it only wraps SessionConfig, what else do we expect to have in OptimizerContext?

A pattern we've seen time and time again in this project is to assume that we won't want to make any changes in the future and then realizing we do. Having a specialized struct lets us do that with very little overhead.

Right, we might need it eventually so we’ve decided to make it as wide as possible to allow for a change that might never be necessary.

Yep exactly 👍🏻 , this avoids more breaking changes in the future.

Nothing really. Before there was no Extensions, you got/set them against SessionConfig directly and it stored a HashMap. I created a wrapper around that HashMap to have a named thing to hand out a reference for and establish a controlled API surface area.

Ok, so we added new struct to the public API which wasn’t really necessary.

That introduction felt helpful to me. Looking at the code it was not clear to me what Extensions were, the code to set/retrieve them was mixed in with other methods of SessionConfig. Having a named struct with the (relatively complex methods; using types as keys is quite specialized) helps in discoverability and maintainability. The public surface area is really not much bigger, we just moved the methods. If you disagree with this I'm happy to revert it before next release: I agree that this change wasn't required. I don't think necessary/unnecessary is a good way to look at a change, often (as in this case) a change isn't necessary but is still beneficial.

@milenkovicm
Copy link
Contributor

Lets leave extensions, the change is not very disruptive, I've totally missed part that previous API is still there

Would you consider using OptimizerContext instead? It’s a very defensive API approach and might not be necessary. It’s been quite stable for a long time, do you expect it to change soon?

@gabotechs
Copy link
Contributor

It’s been quite stable for a long time, do you expect it to change soon?

I personally would love to try extending this OptimizerContext struct for giving a shot to #18296. Made already an attempt in gabotechs#7 that was not very satisfactory as just having access to the SessionConfig during an optimization pass forces the approach to inject further things in SessionConfig.extensions as opaque Any types, but having an OptimizerContext that can be extended with new fields opens the door to a more elegant solution.

@milenkovicm
Copy link
Contributor

Thanks for clarification @gabotechs , just few follow up question:

  1. Is the physical optimiser rule in the right place for such functionality, or would it break separation of concerns and make optimiser do too many things ? For quite some time optimiser functionality had very stable API, maybe there was a reason why it did not change much for long time.

  2. Would you be able to provide more details what do you mean by "more elegant solution", I guess if there is PR description we would be able to understand it from there. Leading to my last question,

  3. Was there a genuine need for such a rush that the PR was merged without any associated issue or description? Breaking down the problem into an issue or at least a description usually helps with understanding it better and potentially prevents changes to functionality that already works as you need it to. As mentioned in the discussion about extensions in Add PhysicalOptimizerRule::optimize_plan to allow passing more context into optimizer rules #18739 (comment), maybe it would prevent unnecessary API change? Also it helps others understanding the the problem you have, and maybe come up good suggestions.

Please have in mind that there are product using datafusion for very long time with quite big codebases. So far datafusion managed to balance api stability and benefits of breaking the api very well. I believe we as the community should continue to maintain such a high standard in the future

@gabotechs
Copy link
Contributor

Is the physical optimiser rule in the right place for such functionality, or would it break separation of concerns and make optimiser do too many things ?

I'd say so yeah. It's during the optimization where dynamic filters are created, which leads me to think that if their state was to be registered in a global place for outside access, it's at that moment when it should happen.

Would you be able to provide more details what do you mean by "more elegant solution", I guess if there is PR description we would be able to understand it from there

There's still a lot of things I should flesh out before giving a proper answer to this, but TL;DR is the second solution exposed in #18296.

Was there a genuine need for such a rush that the PR was merged without any associated issue or description?

I would not say there was a rush, the original PR #18168 has been out there for more than month ago. It's true that the context is a bit scattered across PR comments rather than being properly centralized in one issue.

For @adriangb and I this change felt natural as this is something we've been discussing publicly in the last months, but we might be biased as we were the ones discussing, so if you think this needs more thought we are on time to revert whatever part of this PR you think needs more discussion.

@milenkovicm
Copy link
Contributor

Is the physical optimiser rule in the right place for such functionality, or would it break separation of concerns and make optimiser do too many things ?

I'd say so yeah. It's during the optimization where dynamic filters are created, which leads me to think that if their state was to be registered in a global place for outside access, it's at that moment when it should happen.

Could this be accomplished providing your own query planner? Or keeping state in your own optimiser rule which, rules are "arced" so you could keep reference to it and access it as needed?

Would you be able to provide more details what do you mean by "more elegant solution", I guess if there is PR description we would be able to understand it from there

There's still a lot of things I should flesh out before giving a proper answer to this, but TL;DR is the second solution exposed in #18296.

That is my feeling as well, this PR is merged without all the things are flashed out.

Was there a genuine need for such a rush that the PR was merged without any associated issue or description?

I would not say there was a rush, the original PR #18168 has been out there for more than month ago. It's true that the context is a bit scattered across PR comments rather than being properly centralized in one issue.

yes, comments do not help understanding the context, and problem breakdown in sense of PR/Issue description helps everybody.

For @adriangb and I this change felt natural as this is something we've been discussing publicly in the last months, but we might be biased as we were the ones discussing,

An external perspective can be invaluable in developing a more effective design.

I personally believe introduction of OptimizerContext nor change of Extensions bring any benefits to ballista at this point. None of them have any strong reason to be introduced.

Switching to accept SessionConfig somewhat might make sense. However, I’m wondering if the optimiser rule can retain the state needed to extract from the SessionConfig extension. Since both are tied to SessionContext, I don’t see a strong reason to keep it in two places.

so if you think this needs more thought we are on time to revert whatever part of this PR you think needs more discussion.

I would leave up to you to make such decision, you're the one approving the PR; I personally struggle to see value of this PR at this moment.

@adriangb
Copy link
Contributor Author

adriangb commented Dec 6, 2025

@milenkovicm in order to help determine the best path forward it would be helpful to understand where the pushback against this change is coming from. Will this change cause a lot of code churn for Ballista in a way that is not easily automatable, or is this API incompatible with something you would like to do?

@milenkovicm
Copy link
Contributor

Hey @adriangb
my comments are not focused on ballista at all. Ballista has, I believe, one rule which is forked from datafusion.
But, be aware that there are other multi year codebases built on top of datafusion.

I have been involved in a code base which has been built since DataFusion version 7, and have absorbed a lot of backward-incompatible changes over the years, and most of them had clean benefit(s) to the community. This one does not, at least it has not demonstrated, yet.

My concern here is that the API has been broken without any valid reasons. Two new structures have been introduced to a public API; one of them, as you said in your comment, was clearly not necessary; the other does not demonstrate any valid reasons to be there as well, yet we're all expected to take them. Let's not take the path of least resistance and make changes to the public API if they are not really needed.

So back to change,

  • As you have to provide a physical optimiser rule, can't you bring all other necessary "things" as rule properties? So your optimiser has internally whatever you need in OptimiserContext.
  • Even SessionConfig as an optimiser parameter is hard to sell, as an optimiser rule has the same lifecycle as session state/session config, and optimiser properties may be the location to keep extra data.
  • If you need to keep a reference to an optimiser, can it be kept as a session config extension (optimiser rule may not be seen as a config, maybe we need a session state extension instead to keep a reference to)?

I hope this helps ,

@adriangb
Copy link
Contributor Author

adriangb commented Dec 7, 2025

I opened a PR to make Extensions private, I think that addresses 1/2 of the concern: #19180

My view on the OptimizerContext change is that it's a pretty straightforward change, and doesn't force any code to be changed by users of DataFusion. All old code continues to work until and only if we decide to deprecate the old method. We also added the change into the upgrade guide. I think all of that combined with the wide availability of AI tools means that this sort of API change is easy to absorb compared to DataFusion 7 -> 8. That said I of course agree that a we shouldn't be breaking APIs if there's no reason to do so, with the asterisk that what constitutes a valid reason to break an API or not is a subjective matter.

@gabotechs what do you think of @milenkovicm's proposed solution? If I understand correctly the idea is that the optimizer takes a reference to the SessionConfig via it's constructor. I think this would make things more verbose for these sorts of optimizers, and it doesn't let us unify the API with the logical optimizer (which does have an OptimizerContext), which was one of the goals of this change.

@milenkovicm
Copy link
Contributor

I don't think you understand me correctly,

SessionConfig has two properties, ConfigOptions and extensions. ConfigOptions is already parameter of optimiser method, so whats the reason to extend the method parameter to SessionConfig? If you need something else than ConfigOptions introduce it as part of the your rule constructor, you can make reference to your rule or your rule property, or whatever you want. If you need access to session config extensions, they are arced, they can be introduced using rule constructor as well (rule and extensions are created at the same time).
I’m constantly asking what else is needed in OptimiserContext to justify its existence but I haven’t received any answers.

Also note, you have marked old method for removal (Btw, there is difference, between logical optimiser and physical, logical accepts trait not a struct, which makes difference)

My point isn’t whether it’s easy to update after a change. Writing code is easy these days but we should maintain a much higher standard for ourselves than just writing a code.

@milenkovicm
Copy link
Contributor

to make it easier to understand, please have a look, it illustrates my point (this is datafusion 50),

use datafusion::config::{ConfigEntry, ConfigExtension, ConfigOptions, ExtensionOptions};
use datafusion::execution::SessionStateBuilder;
use datafusion::physical_optimizer::PhysicalOptimizerRule;
use datafusion::physical_plan::ExecutionPlan;
use datafusion::prelude::{SessionConfig, SessionContext};
use std::any::Any;
use std::fmt::Debug;
use std::sync::Arc;

#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
    // you may want to keep your rule_context
    // accessible or not, im not sure what you want
    let rule_context = Arc::new(MyRuleContext::default());
    let my_rule = Arc::new(MyRule::new(rule_context.clone()));
    // if for any reason you need to reconfigure your rule
    // use can use SQL SET or directly on session config 
    let my_rule_config = MyRuleConfig::default();

    // assign your rule context to a session config
    // so it gets dropped when your session gets dropped
    // use of session extension
    let session_config = SessionConfig::new()
        .with_extension(rule_context.clone())
        .with_option_extension(my_rule_config);

    let session_state = SessionStateBuilder::new_with_default_features()
        .with_config(session_config)
        .with_physical_optimizer_rule(my_rule)
        .build();

    let ctx = SessionContext::new_with_state(session_state);
    // this should trigger rule
    ctx.sql("select 1").await?.show().await?;

    Ok(())
}

#[derive(Debug, Default)]
struct MyRuleContext {
    // probably your stage
    // need some kind of week hash
    // map or similar
    // if you want to track physical expressions
}
#[derive(Debug)]
struct MyRule {
    context: Arc<MyRuleContext>,
}

impl MyRule {
    fn new(context: Arc<MyRuleContext>) -> Self {
        Self { context }
    }
}

impl PhysicalOptimizerRule for MyRule {
    fn optimize(
        &self,
        plan: Arc<dyn ExecutionPlan>,
        config: &ConfigOptions,
    ) -> datafusion::common::Result<Arc<dyn ExecutionPlan>> {
        // configure / reconfigure your rule
        // any way you need
        let my_config: Option<&MyRuleConfig> = config.extensions.get();

        println!("MyRule config: {:?}", my_config);
        println!("MyRule context: {:?}", self.context);
        Ok(plan)
    }

    fn name(&self) -> &str {
        "MyRule"
    }

    fn schema_check(&self) -> bool {
        false
    }
}

#[derive(Debug, Default, Clone)]
struct MyRuleConfig {
    // whatever you need to configure your
    // rule goes here
}

impl ExtensionOptions for MyRuleConfig {
    fn as_any(&self) -> &dyn Any {
        self
    }

    fn as_any_mut(&mut self) -> &mut dyn Any {
        self
    }

    fn cloned(&self) -> Box<dyn ExtensionOptions> {
        Box::new(self.clone())
    }

    fn set(&mut self, _key: &str, _value: &str) -> datafusion::common::Result<()> {
        todo!("to be implemented later")
    }

    fn entries(&self) -> Vec<ConfigEntry> {
        todo!("to be implemented later")
    }
}

impl ConfigExtension for MyRuleConfig {
    const PREFIX: &'static str = "my_rule";
}

it should print:

MyRule config: Some(MyRuleConfig)
MyRule context: MyRuleContext
+----------+
| Int64(1) |
+----------+
| 1        |
+----------+

@gabotechs
Copy link
Contributor

ConfigOptions is already parameter of optimiser method, so whats the reason to extend the method parameter to SessionConfig?

From #18168, which is linked at the beginning of this PR:

"""
Users might want to create their own PhysicalOptimizerRule implementations. Users might have their own SessionConfig.extensions. Users might want to use their own SessionConfig.extensions in their custom PhysicalOptimizerRule implementations.
"""

This is the case that we (DataDog) have faced several times already in several of our optimization rules

If you need access to session config extensions, they are arced, they can be introduced using rule constructor as well (rule and extensions are created at the same time).

I'd say not necessarily, a rule that is a simple empty struct might be constructed statically. Following this same reasoning, we might also decide that it's better to let PhysicalOptimizerRule::optimize take no arguments (not even ConfigOptions) as those can also be passed at construction time.

I’m constantly asking what else is needed in OptimiserContext to justify its existence but I haven’t received any answers.

As we have a legitimate need for changing PhysicalOptimizerRule::optimize to take SessionConfig instead of ConfigOptions (reason at the beginning of this message), we might as well find more legitimate needs in the future for other new fields, hence we want to commit to API stability by changing the PhysicalOptimizerRule::optimize argument only once, as we need to break it now anyways.

@gabotechs
Copy link
Contributor

gabotechs commented Dec 7, 2025

I would leave up to you to make such decision, you're the one approving the PR; I personally struggle to see value of this PR at this moment.

If it was only up to me, I'd say the change in this PR is appropriate and I'm good with moving forward with it. It's not only up to me though, we should also weight opinions of other contributors like you, and specially yours a as long tenured contributor to DataFusion.

Would you be fine with a more contained change that uses SessionConfig instead of ConfigOptions as PhysicalOptimizerRule::optimize argument?

@adriangb
Copy link
Contributor Author

adriangb commented Dec 7, 2025

Would you be fine with a more contained change that uses SessionConfig instead of ConfigOptions as PhysicalOptimizerRule::optimize argument?

I'll note that this would be an immediate breaking change and a much worse experience for downstream users. We'd at the very least need to do the dance of adding a new method, deprecating the old one, etc.

@gabotechs
Copy link
Contributor

I'll note that this would be an immediate breaking change and a much worse experience for downstream users. We'd at the very least need to do the dance of adding a new method, deprecating the old one, etc.

Yeah, definitely. Did not explicitly bake that into my explanation but I was pricing that in

@milenkovicm
Copy link
Contributor

I would leave up to you to make such decision, you're the one approving the PR; I personally struggle to see value of this PR at this moment.

If it was only up to me, I'd say the change in this PR is appropriate and I'm good with moving forward with it. It's not only up to me though, we should also weight opinions of other contributors like you, and specially yours a as long tenured contributor to DataFusion.

Would you be fine with a more contained change that uses SessionConfig instead of ConfigOptions as PhysicalOptimizerRule::optimize argument?

Users do create their own rules, and that is very common. I have created quite few of them, hence my adamance we do not need this change.

There is nothing you mentioned in #18168 that can't be done with the previous API. Requirements of #18168 are very vague as well. If you took Extension which @adriangb introduced, as your constructor parameter you'd be able to add your extensions dynamically, or bend it in any way you like. There is nothing in this change that deserves to be generalised and moved to core library function.

If you like, we can take this at regular datafusion meeting, I have no problem with that. We can also take discussion about format of this PR as well, lack description or justification of breaking public API, or we can just revert the change and you do your specific use case things in your own code.

Please let me know the which direction you would like to take,

@gabotechs
Copy link
Contributor

There is nothing you mentioned in #18168 that can't be done with the previous API

This is true, for what is worth, we can leave the PhysicalOptimizerRule::optimize with empty arguments and we'd still be able to prompt users to pass everything in the constructors.

I'm fine in rolling back this change and finish the discussion here, are you fine with that @adriangb?

@adriangb
Copy link
Contributor Author

adriangb commented Dec 7, 2025

We can also take discussion about format of this PR as well, lack description or justification of breaking public API

The lack of description in this PR is my fault, I apologize for that. I don't think there's discussion needed, a good PR description is clearly helpful and not having one in this case was frustrating to you and possibly other community members. This is not a justification but this was a follow up / alternative to #18168 and although it is linked to it in various ways (it's the first link that appears under the discussion) I should have copied the link into the description and added some more background. That's a mistake on my end and I understand why that might be frustrating, but I don't think that invalidates the entire change or requires the PR to be reverted.

#18168 (comment) (from the PR mentioned above) links to code currently being used to work around this that is very similar to what you are proposing:

impl ConfigExtension for DistributedConfig {
    const PREFIX: &'static str = "distributed";
}

While it technically works, it is not ergonomic. My view and understanding of the project is that wanting to do something with DataFusion that ends up being hard to figure out / not ergonomic is generally a valid reason for making tweaks to a public API. Beyond that I think the details of what constitutes unergonomic and what is a large or small API change are somewhat subjective. Gabriel and I do not work for the same organization, so there is no you here. We looked at the problem, had a PR with a good description, justification and discussion open for over a month and then came to the conclusion together that this would be the best solution. I understand that is maybe irrelevant if it's easy to miss because it's not linked to from this PRs description, which once again I am sorry if resulted in a waste of your time working backwards to understand why the change was being made.

If you still are not comfortable with the change I think we should just revert it and start discussion anew with more formal information in the PR description itself or just live with the current status quo. It doesn't seem worth burdening the wider community with. Would you like me to revert the PR?

@adriangb
Copy link
Contributor Author

adriangb commented Dec 7, 2025

I'm fine in rolling back this change and finish the discussion here, are you fine with that @adriangb?

Yes that's fine by me. I'll prepare a revert.

adriangb added a commit to pydantic/datafusion that referenced this pull request Dec 7, 2025
…e context into optimizer rules (apache#18739)"

This reverts commit b990987.
adriangb added a commit to pydantic/datafusion that referenced this pull request Dec 8, 2025
…e context into optimizer rules (apache#18739)"

This reverts commit b990987.
adriangb added a commit to pydantic/datafusion that referenced this pull request Dec 8, 2025
…e context into optimizer rules (apache#18739)"

This reverts commit b990987.
github-merge-queue bot pushed a commit that referenced this pull request Dec 8, 2025
This reverts #18739 by
reverting b990987 and
7b4593f.

Due to feedback in #18739 we've
decided to revert it at least for the time being. Since this change has
not been released this is not be a breaking API change.
@milenkovicm
Copy link
Contributor

milenkovicm commented Dec 8, 2025

left you a comment in your original repo @gabotechs, not sure if that will help you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate core Core DataFusion crate execution Related to the execution crate optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants