Convert to BodoDataFrame/BodoSeries on fallback by scott-routledge2 · Pull Request #855 · bodo-ai/Bodo

scott-routledge2 · 2025-09-30T21:27:34Z

Changes included in this PR

Generate BodoSeries and BodoDataFrames after running an unsupported function in Pandas. Also adds extra error checking to from_pandas

Testing strategy

User facing changes

Better error messages in from_pandas. Result of fallback methods returns BodoDataFrames/Series

Checklist

Pipelines passed before requesting review. To run CI you must include [run CI] in your commit message.
I am familiar with the Contributing Guide
I have installed + ran pre-commit hooks.

scott-routledge2 · 2025-10-01T21:01:22Z

bodo/pandas/utils.py

+
+                    # Convert objects to Bodo before returning them to the user.
+                    if FallbackContext.is_top_level():
+                        return convert_to_bodo(py_res)


My observation was that Pandas methods call a lot of internal functions we do not support (example: xs, copy), so we can keep the DataFrame as Pandas for the internal calls and only convert when returning back to the user.

This is also currently hiding a small bug in a lot of the tests that I haven't figured out yet, but seemed minor to me:

df = pd.DataFrame({"A": [1, 2, 3], "B": ["a", "b", "c"]}, index = [1,2,3]) bdf = bd.from_pandas(df) bdf1 = bdf.rename_axis("index123") bdf2 = bdf1.copy() print("bodo result: ", bdf2.index.name) pdf1 = df.rename_axis("index123") pdf2 = pdf1.copy() print("pandas result: ", pdf2.index.name)

Pandas result: "index123", Bodo result: None (index name doesn't propagate in some places)

codecov · 2025-10-01T21:40:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.98%. Comparing base (c33fbb5) to head (8f9e8f2).
⚠️ Report is 66 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #855      +/-   ##
==========================================
+ Coverage   66.68%   68.98%   +2.30%     
==========================================
  Files         186      191       +5     
  Lines       66795    67217     +422     
  Branches     9507     9531      +24     
==========================================
+ Hits        44543    46373    +1830     
+ Misses      19572    18021    -1551     
- Partials     2680     2823     +143

scott-routledge2 · 2025-10-03T18:37:53Z

bodo/tests/test_lazy/test_bodo_series.py

    """Tests that slicing returns the correct value and does not trigger data fetch unnecessarily"""
    lazy_manager, pandas_manager = single_pandas_managers

+    if pandas_manager == SingleArrayManager:


These seem to be an existing issues that was exposed by this PR now that there is more conversion between pandas and bodo happening. I can open a followup to investigate but in my opinion it is not as big of a priority since BlockManager is the default and ArrayManager will be removed in Pandas 3.0

DrTodd13

Thanks Scott. Looks pretty good.

DrTodd13 · 2025-10-03T19:06:18Z

bodo/pandas/base.py

+    for c in df.columns:
+        if isinstance(df[c], pd.DataFrame):
+            raise BodoLibNotImplementedException(
+                f"from_pandas(): Duplicate column name: '{c}'."


How does df[c] ever become itself a dataframe and why is that labelled as a duplicate column?

df[c] returns a dataframe with all columns named "c" in the case of duplicates

ehsantn

Thanks @scott-routledge2!

ehsantn · 2025-10-03T19:16:56Z

bodo/tests/test_spawn/test_spawn_mode.py

    assert sub.returncode == 0


+@pytest.mark.skip("TODO: Fix flakey test on CI.")


Let's open an issue and put on oncall board not to forget.

ehsantn · 2025-10-03T20:23:11Z

bodo/pandas/base.py

+        )
+    new_columns = []
+    for c in df.columns:
+        if isinstance(df[c], pd.DataFrame):


Using df.columns.has_duplicates is simpler and more reliable. columns is an Index, which is sort of a set and should have this info internally I think.

scott-routledge2 added 7 commits September 30, 2025 12:44

add convert_to_bodo func

964094a

Merge branch 'main' into scott/make_bodo_on_fallback

35edb21

add a test

53cdff4

fix some tests

abca6a3

use context manager for fallback

36a45b1

add more error checking to from_pandas

fdeffcb

[run ci]

c5d792d

scott-routledge2 commented Oct 1, 2025

View reviewed changes

scott-routledge2 added 7 commits October 2, 2025 10:52

fix some tests [run ci]

b491e9f

Merge branch 'main' into scott/make_bodo_on_fallback

553e453

Merge branch 'main' into scott/make_bodo_on_fallback

21c53cd

fix some more tests [run ci]

4580bc2

skip some tests [run ci]

3da3fb8

fix slice test

ade9ee9

skip SingleArrayManager test due to isses

393303e

scott-routledge2 commented Oct 3, 2025

View reviewed changes

[run ci]

b5ebaad

scott-routledge2 requested review from DrTodd13 and ehsantn October 3, 2025 18:38

DrTodd13 approved these changes Oct 3, 2025

View reviewed changes

scott-routledge2 marked this pull request as ready for review October 3, 2025 19:28

fix test_slice check

dc25355

ehsantn approved these changes Oct 3, 2025

View reviewed changes

scott-routledge2 added 2 commits October 3, 2025 17:00

minor fixes [run ci]

92490c8

reset index in slice test [run ci]

8f9e8f2

scott-routledge2 merged commit 8f2c217 into main Oct 6, 2025
26 checks passed

scott-routledge2 deleted the scott/make_bodo_on_fallback branch October 6, 2025 13:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert to BodoDataFrame/BodoSeries on fallback#855

Convert to BodoDataFrame/BodoSeries on fallback#855
scott-routledge2 merged 18 commits intomainfrom
scott/make_bodo_on_fallback

scott-routledge2 commented Sep 30, 2025 •

edited

Loading

Uh oh!

scott-routledge2 Oct 1, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 1, 2025 •

edited

Loading

Uh oh!

scott-routledge2 Oct 3, 2025

Uh oh!

DrTodd13 left a comment

Uh oh!

DrTodd13 Oct 3, 2025

Uh oh!

scott-routledge2 Oct 3, 2025

Uh oh!

ehsantn left a comment

Uh oh!

ehsantn Oct 3, 2025

Uh oh!

ehsantn Oct 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		assert sub.returncode == 0


		@pytest.mark.skip("TODO: Fix flakey test on CI.")

Conversation

scott-routledge2 commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes included in this PR

Testing strategy

User facing changes

Checklist

Uh oh!

scott-routledge2 Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

scott-routledge2 Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

DrTodd13 left a comment

Choose a reason for hiding this comment

Uh oh!

DrTodd13 Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

scott-routledge2 Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

ehsantn left a comment

Choose a reason for hiding this comment

Uh oh!

ehsantn Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

ehsantn Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

scott-routledge2 commented Sep 30, 2025 •

edited

Loading

scott-routledge2 Oct 1, 2025 •

edited

Loading

codecov bot commented Oct 1, 2025 •

edited

Loading