Speed up TPCH. by DrTodd13 · Pull Request #856 · bodo-ai/Bodo

DrTodd13 · 2025-09-30T22:29:51Z

Changes included in this PR

Traverse right-side of cross-product so CTE can find duplicate computations.
Run to_datetime as cfunc over a map instead of Python.

Testing strategy

run_ci

User facing changes

None

Checklist

Pipelines passed before requesting review. To run CI you must include [run CI] in your commit message.
I am familiar with the Contributing Guide
I have installed + ran pre-commit hooks.

…ted options are used.

codecov · 2025-09-30T23:57:26Z

Codecov Report

❌ Patch coverage is 30.43478% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.96%. Comparing base (c33fbb5) to head (03ad320).
⚠️ Report is 64 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #856      +/-   ##
==========================================
+ Coverage   66.68%   68.96%   +2.27%     
==========================================
  Files         186      191       +5     
  Lines       66795    67170     +375     
  Branches     9507     9523      +16     
==========================================
+ Hits        44543    46321    +1778     
+ Misses      19572    18027    -1545     
- Partials     2680     2822     +142

DrTodd13 · 2025-10-01T02:18:57Z

Is there a better way to flag the tests that have to_datetime in them as now requiring JIT?

IsaacWarren

Thanks @DrTodd13

IsaacWarren · 2025-10-01T14:50:36Z

bodo/pandas/base.py

+        # Declare function to be compiled to run to_datetime over series.
+        func = "def bodo_to_datetime(x):\n"
+        # Embed format string as constant in function.
+        func += f"    return pd.to_datetime(x, format='{in_kwargs['format']}')\n"
+        # Create the function from string.
+        to_datetime_func = bodo_spawn_exec(func, {"pd": pd}, {}, __name__)
+        return arg.map(to_datetime_func)


Would it be possible to use a c++ kernel for this maybe using Arrow instead of the compiler? Not saying it needs done now but maybe a comment for a followup

scott-routledge2 · 2025-10-01T15:06:31Z

Is there a better way to flag the tests that have to_datetime in them as now requiring JIT?

Not that I can think of? Maybe we could change the test command to run each test module as a separate session? (I think runtests.py does this. That way would could tell the first function to import JIT in each file at least.

scott-routledge2

Thanks @DrTodd13, LGTM.

scott-routledge2 · 2025-10-01T15:07:52Z

bodo/pandas/base.py

+        # Embed format string as constant in function.
+        func += f"    return pd.to_datetime(x, format='{in_kwargs['format']}')\n"
+        # Create the function from string.
+        to_datetime_func = bodo_spawn_exec(func, {"pd": pd}, {}, __name__)


Maybe this needs bodo.jit(func, cache=True) here?

Todd A. Anderson added 9 commits September 26, 2025 16:17

Add caching to loading functions.

226f52d

Add ability for jit tpch to not run all queries.

e6f2e24

Merge branch 'main' into todd/tpch_testing

7e743da

Fix CTE for cross product.

f695748

Dataframe lib to_datetime will run as cfunc over a map if only suppor…

8430325

…ted options are used.

Add comments.

f442b14

[run CI]

0769c1a

Adjust cte counts.

dede844

[run CI]

7f252e5

Todd A. Anderson added 2 commits September 30, 2025 19:17

Try adding explicit jit_dependency marker for tests with to_datetime.

89dfbad

[run CI]

5055393

DrTodd13 requested review from IsaacWarren and scott-routledge2 October 1, 2025 02:18

IsaacWarren approved these changes Oct 1, 2025

View reviewed changes

scott-routledge2 approved these changes Oct 1, 2025

View reviewed changes

Todd A. Anderson added 10 commits October 1, 2025 10:52

Use captured format instead of dynamically generated func.

81e2b06

[run CI]

391b212

mark test as needing jit.

e59c01f

[run CI]

06c55aa

Debugging for Scott.

c911c35

Remove debugging code.

3769cda

Copy nullable when converting datetime64.

4d80dd0

[run CI]

6cee4c7

Reset_index and sort in test.

fc0b98c

[run CI]

03ad320

DrTodd13 merged commit 2d66959 into main Oct 3, 2025
25 of 26 checks passed

DrTodd13 deleted the todd/tpch_testing branch October 3, 2025 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up TPCH.#856

Speed up TPCH.#856
DrTodd13 merged 21 commits intomainfrom
todd/tpch_testing

DrTodd13 commented Sep 30, 2025

Uh oh!

codecov bot commented Sep 30, 2025 •

edited

Loading

Uh oh!

DrTodd13 commented Oct 1, 2025

Uh oh!

IsaacWarren left a comment

Uh oh!

IsaacWarren Oct 1, 2025

Uh oh!

scott-routledge2 commented Oct 1, 2025

Uh oh!

scott-routledge2 left a comment

Uh oh!

scott-routledge2 Oct 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DrTodd13 commented Sep 30, 2025

Changes included in this PR

Testing strategy

User facing changes

Checklist

Uh oh!

codecov bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

DrTodd13 commented Oct 1, 2025

Uh oh!

IsaacWarren left a comment

Choose a reason for hiding this comment

Uh oh!

IsaacWarren Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

scott-routledge2 commented Oct 1, 2025

Uh oh!

scott-routledge2 left a comment

Choose a reason for hiding this comment

Uh oh!

scott-routledge2 Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Sep 30, 2025 •

edited

Loading