fix: Support nested dtypes from Python in search_sorted()#22633
fix: Support nested dtypes from Python in search_sorted()#22633itamarst wants to merge 26 commits intopola-rs:mainfrom
Conversation
|
>>> pl.Series([[1], [2], [3]]).search_sorted([2])
1
>>> pl.Series([[1], [2], [3]]).search_sorted([3])
2
>>> pl.Series([[1], [2], [3]]).search_sorted([2, 3])
2
Update: Never mind, forgot this wasn't exact search, it's "where would value go to be inserted in correct sorted order". |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #22633 +/- ##
==========================================
+ Coverage 80.35% 80.40% +0.04%
==========================================
Files 1682 1682
Lines 223237 223289 +52
Branches 2804 2804
==========================================
+ Hits 179389 179532 +143
+ Misses 43179 43089 -90
+ Partials 669 668 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@coastalwhite I've ported the code, and now looking at addressing your comments.
The obvious solution to get consistent signatures is to always interpret lists as literals. But that is not backwards compatible. Some optionsOption 1. What's in the initial PR
In Polars 2, where backwards compatibility could be dropped, one would then change the function so that lists are always interpreted as literals, and you always have to pass The problem with this is inconsistent type signatures. I do not see any way to fix this. The benefit is fully backwards compatible, and nicer usage for This may be best demonstrated with some examples: # For non-nested types, both lists and Series result in Series (multi-search):
>>> pl.Series([17, 5]).search_sorted([17, 2, 5])
shape: (3,)
Series: '' [u32]
[
2
0
0
]
>>> pl.Series([17, 5]).search_sorted(pl.Series([17, 2, 5]))
shape: (3,)
Series: '' [u32]
[
2
0
0
]
>>> pl.Series([17, 5]).search_sorted(5)
0
# For a nested type, a list results in an int, but a Series results in a Series (multi-search):
>>> pl.Series([[17], [5]]).search_sorted([5])
0
>>> pl.Series([[17], [5]]).search_sorted(pl.Series([5]))
shape: (1,)
Series: '' [u32]
[
0
]Option 2. Lists mean multi-search for
|
|
I think this should get a similar treatment to For .implode() on .is_in().
If you do this and remove the |
|
So I was already doing Is there a reason to do it on the Rust side? |
You cannot do this on the Python side as you do not know the type on the python side. You cannot do this Python trick for arbitrary nested datatypes and not for arbitrary expressions. |
|
OK, Rust it is. |
|
To what extent could the existing code in |
Well, it is similar but not the same ( |
|
Next I will be merging this forward and addressing the review comments. |
|
@coastalwhite ready for review again, hopefully. |
|
@coastalwhite merged forward, hopefully ready for review again. |
|
Can't figure out how to get rid of the "changes requested" bit in the UI 😞 |
|
Can simplify this once #23458 is merged, so turning back into draft. Also might want to get categorical working while I'm at it. |
|
And after further thought I think switching |
ddf5907 to
d0914d4
Compare
90ceb7b to
e9fce55
Compare
Fixes #21100
Also: