Skip to content

Conversation

@nuno-faria
Copy link
Contributor

@nuno-faria nuno-faria commented Sep 23, 2025

@alamb
Copy link
Contributor

alamb commented Sep 23, 2025

Amazing! Thank you @nuno-faria -- I will review this PR today or tomorrow

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nuno-faria -- I think this post looks great. Thank you so much for writing it

I am working on getting some performance numbers and will update the post when completed.

I pushed a commit to change the date to next Monday as well as add new committers and the post that @timsaucer published yesterday https://datafusion.apache.org/blog/2025/09/21/custom-types-using-metadata/

I also have a few other suggestions I am working on that I will post shortly

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a few more suggestions.

The only other thing I noticed is that the performance section is somewhat duplicated by the new features section (the dynamic filters are mentioned twice, for example)

I am going to take a pass at trying to make that a bit better, but I don't think I can pull it off as github suggestions. I'll make a suggestion PR instead

@alamb
Copy link
Contributor

alamb commented Sep 25, 2025

@nuno-faria would it be ok if I pushed some edits directly to this branch to avoid back/forth and PRs? Then you could review the changes commit by commit

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @nuno-faria and @adriangb and @2010YOUY01

I made some non trivial suggestions here:

Let me know what you think

Suggestions for DataFusion 50 blog post
@nuno-faria
Copy link
Contributor Author

@nuno-faria would it be ok if I pushed some edits directly to this branch to avoid back/forth and PRs? Then you could review the changes commit by commit

Please, feel free to do so. Later today I will also take a look at the suggestions.

@nuno-faria
Copy link
Contributor Author

Suggestions applied.

@vegarsti
Copy link
Contributor

Great post! Somehow I get an error trying to comment in review mode, but for the section on the cache:

Maybe this is too much hedging, but: this sounds like we get orders of magnitude gains always. But that probably depends on the query!

@alamb
Copy link
Contributor

alamb commented Sep 26, 2025

Maybe this is too much hedging, but: this sounds like we get orders of magnitude gains always. But that probably depends on the query!

yeah, I think specifically it really helps:

  1. When the files are entirely remote (on object_store)
  2. The queries are relatively low latency (10s of ms) as parsing the footer can be substantial amount of the overall query processing time

I'll try and add that detail to the post

@alamb
Copy link
Contributor

alamb commented Sep 26, 2025

Thanks @nuno-faria I think we need to include a Known issues section and point users to upcoming hot fixes release and whats in there.

Just point to apache/datafusion#17594

Maybe I can also make a more general "patch set" section that talks about how we have been stabilizing the releases recently by releasing patches as the community upgrades and finds issues 🤔

@alamb
Copy link
Contributor

alamb commented Sep 26, 2025

Also, is it ok if I put contributors names next to the features as we have done in past releases? I think that is a nice acknowledgment to the community as well as serves as additional motivation for future contributors

nuno-faria and others added 2 commits September 26, 2025 19:38
Co-authored-by: Matt Butrovich <[email protected]>
@nuno-faria
Copy link
Contributor Author

Thanks @nuno-faria I think we need to include a Known issues section and point users to upcoming hot fixes release and whats in there.

Just point to apache/datafusion#17594

Agreed, added.

@nuno-faria
Copy link
Contributor Author

Also, is it ok if I put contributors names next to the features as we have done in past releases? I think that is a nice acknowledgment to the community as well as serves as additional motivation for future contributors

Sounds good to me.

Co-authored-by: Vegard Stikbakke <[email protected]>
@nuno-faria
Copy link
Contributor Author

Maybe this is too much hedging, but: this sounds like we get orders of magnitude gains always. But that probably depends on the query!

yeah, I think specifically it really helps:

1. When the files are entirely remote (on object_store)

2. The queries are relatively low latency (10s of ms) as parsing the footer can be substantial amount of the overall query processing time

I'll try and add that detail to the post

I added a small clarification when mentioning the speedup.

@alamb
Copy link
Contributor

alamb commented Sep 29, 2025

I am going to take one final pass to incorporate the feedback here and get this post published!

[ticket](https://github.com/apache/datafusion/pull/16971)). This optimization
is production ready and enabled by default (more details in the
[Epic](https://github.com/apache/datafusion/issues/17000)).
Thanks to [Nuno Faria], [Jonathan Chen], [Shehab Amin], [Oleks V], [Tim Saucer], and [Blake Orth] for delivering this feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi @nuno-faria @jonathanc-n, @shehabgamin, @comphead, @timsaucer and @BlakeOrth as you are mentioned here

More information can be found in the respective
[ticket](https://github.com/apache/datafusion/pull/16445) and the next step will be to
[extend the dynamic filters to other types of joins](https://github.com/apache/datafusion/issues/16973), such as `LEFT` and
`RIGHT` outer joins. Thanks to [Adrian Garcia Badaracco], [Qi Zhu], [xudong963], [Daniël Heres], and [Lía Adriana]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi @adriangb @zhuqi-lucas , @xudong963 @Dandandan and @LiaCastaneda as you are mentioned here

of multi-level merge sorts (more details in the respective
[ticket](https://github.com/apache/datafusion/pull/15700)). It is now
possible to execute almost any sorting query that would have previously triggered *out-of-memory*
errors, by relying on disk spilling. Thanks to [Raz Luvaton], [Yongting You], and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi @rluvaton, @2010YOUY01 and @ding-young as you are mentioned here


Although it is not part of the SQL standard (yet), it has been gaining
adoption in several SQL analytical systems such as DuckDB, Snowflake, and
BigQuery. Thanks to [Huaijin] and [Jonah Gao] for delivering this feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @haohuaijin and @jonahgao as you are mentioned here

FROM table
```

Thanks to [Geoffrey Claude] and [Jeffrey Vo] for delivering this feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @geoffreyclaude and @Jefffrey as you are mentioned here

behavior that varies based on runtime state; for example, time UDFs can use the
session-specified time zone instead of just UTC.

Thanks to [Bruce Ritchie], [Piotr Findeisen], [Oleks V], and [Andrew Lamb] for delivering this feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @comphead @findepi @comphead as you are mentioned here

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think this blog post is looking good so let's publish it. We can make a follow on PR with any edits that are needed

Thanks again everyone!

@alamb alamb merged commit 286c09f into apache:main Sep 29, 2025
1 check passed
@alamb
Copy link
Contributor

alamb commented Sep 29, 2025

And the blog is live: https://datafusion.apache.org/blog/2025/09/29/datafusion-50.0.0/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Blog post for the DataFusion 50.0.0 release

7 participants