Conversation
…ted options are used.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #856 +/- ##
==========================================
+ Coverage 66.68% 68.96% +2.27%
==========================================
Files 186 191 +5
Lines 66795 67170 +375
Branches 9507 9523 +16
==========================================
+ Hits 44543 46321 +1778
+ Misses 19572 18027 -1545
- Partials 2680 2822 +142 |
|
Is there a better way to flag the tests that have to_datetime in them as now requiring JIT? |
bodo/pandas/base.py
Outdated
| # Declare function to be compiled to run to_datetime over series. | ||
| func = "def bodo_to_datetime(x):\n" | ||
| # Embed format string as constant in function. | ||
| func += f" return pd.to_datetime(x, format='{in_kwargs['format']}')\n" | ||
| # Create the function from string. | ||
| to_datetime_func = bodo_spawn_exec(func, {"pd": pd}, {}, __name__) | ||
| return arg.map(to_datetime_func) |
There was a problem hiding this comment.
Would it be possible to use a c++ kernel for this maybe using Arrow instead of the compiler? Not saying it needs done now but maybe a comment for a followup
Not that I can think of? Maybe we could change the test command to run each test module as a separate session? (I think |
scott-routledge2
left a comment
There was a problem hiding this comment.
Thanks @DrTodd13, LGTM.
bodo/pandas/base.py
Outdated
| # Embed format string as constant in function. | ||
| func += f" return pd.to_datetime(x, format='{in_kwargs['format']}')\n" | ||
| # Create the function from string. | ||
| to_datetime_func = bodo_spawn_exec(func, {"pd": pd}, {}, __name__) |
There was a problem hiding this comment.
Maybe this needs bodo.jit(func, cache=True) here?
Changes included in this PR
Traverse right-side of cross-product so CTE can find duplicate computations.
Run to_datetime as cfunc over a map instead of Python.
Testing strategy
run_ci
User facing changes
None
Checklist
[run CI]in your commit message.