Skip to content

Conversation

@jgoday
Copy link
Contributor

@jgoday jgoday commented Aug 2, 2021

Which issue does this PR close?

Closes #572.

Rationale for this change

Allows to use window functions from python bindings (inside a select expression).

df = df.select(
  f.col("a"),
  f.alias(f.window("lead", [f.col("b")], order_by=[f.order_by(f.col("b"))]), "a_next"),
)

What changes are included in this PR?

Are there any user-facing changes?

Changes to python crate, new exported functions (sort_by, alias and window).

@github-actions github-actions bot added datafusion sql SQL Planner labels Aug 2, 2021
@jgoday
Copy link
Contributor Author

jgoday commented Aug 2, 2021

There is some pending issues:

  1. Currently, only a generic way of calling window functions by name is exported. Should we export all window functions individually ?
    fn window(
        name: &str,
        args: Vec<expression::Expression>,
        partition_by: Option<Vec<expression::Expression>>,
        order_by: Option<Vec<expression::Expression>>,
    ) -> PyResult<expression::Expression> {
  2. There is some functionality copied from internal datafusion::sql::utils crate,
    to find window expressions and wrap a logical plan into a LogicalPlan::Window.
    I have changed python crate datafusion dependency (to local project).
    datafusion = { path = "../datafusion" }
    Obviously this change is completely wrong and temporal,
    just to allow using internal datafusion::sql::utils from python crate (find_window_exprs and group_window_expr_by_sort_keys).
    • (Edit) I have reverted it in the last update (to allowing running the test without changing the python crate dependencies)
    • Would it be better to duplicate the code so as not to change the visibility of that internal crate and functions currently public(crate) ?

@houqp
Copy link
Member

houqp commented Aug 3, 2021

cc @jimexist

Comment on lines 112 to 115
// sort by sort_key len descending, so that more deeply sorted plans gets nested further
// down as children; to further mimic the behavior of PostgreSQL, we want stable sort
// and a reverse so that tieing sort keys are reversed in order; note that by this rule
// if there's an empty over, it'll be at the top level
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if there can be a way to reuse this part of the code

@jimexist
Copy link
Member

FYI #1167 added some support in DataFrame API

@jgoday jgoday force-pushed the python_window_functions_bindings branch from 749e5f7 to 0057285 Compare October 27, 2021 21:14
@jgoday jgoday closed this Oct 27, 2021
@jgoday jgoday force-pushed the python_window_functions_bindings branch from 0057285 to f99e9c8 Compare October 27, 2021 21:33
jgoday added a commit to jgoday/arrow-datafusion that referenced this pull request Oct 27, 2021
@jgoday jgoday reopened this Oct 27, 2021
@jgoday
Copy link
Contributor Author

jgoday commented Oct 27, 2021

FYI #1167 added some support in DataFrame API

Thank you @jimexist, I have updated the code to avoid code duplication and therefore use the DataFrame API as you pointed me.

@jimexist
Copy link
Member

hi @jgoday do you mind rebasing first?

jgoday added a commit to jgoday/arrow-datafusion that referenced this pull request Nov 17, 2021
@jgoday jgoday force-pushed the python_window_functions_bindings branch from 6993939 to 2cf2694 Compare November 17, 2021 21:41
@jgoday jgoday force-pushed the python_window_functions_bindings branch from 2cf2694 to d24b886 Compare November 17, 2021 21:45
@jgoday
Copy link
Contributor Author

jgoday commented Nov 17, 2021

Hi @jimexist, rebasing done.
Now, thanks to the last changes from master, there's no need to adapt the PyDataFrame select expressions (there is no changes in python/src/dataframe.rs now), much clearer now !

Comment on lines +106 to +114
#[pyfunction]
fn alias(expr: PyExpr, name: &str) -> PyResult<PyExpr> {
Ok(PyExpr {
expr: datafusion::logical_plan::Expr::Alias(
Box::new(expr.expr),
String::from(name),
),
})
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think alias should be a method on the expression or column object?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expression already has an alias method, but perhaps we can support both APIs.

Copy link
Member

@jimexist jimexist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 😎🤘

@jimexist jimexist changed the title Draft: python bindings for window functions Python bindings for window functions Nov 23, 2021
@jimexist jimexist merged commit 743a81f into apache:master Nov 23, 2021
unkloud pushed a commit to unkloud/datafusion that referenced this pull request Mar 23, 2025
…ache#819)

* new benchmark

* remove assertion from average agg, add stddev microbenchmark

* disable stddev by default

* format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

python bindings for window functions

3 participants