Support Pandas 3 by ehsantn · Pull Request #1009 · bodo-ai/Bodo

ehsantn · 2026-01-22T02:45:31Z

Changes included in this PR

As title. Major changes include:

Datetime/timedelta arrays usually default to microsecond instead of nanosecond in Pandas 3. Made nullable datetime the default array type for series/dataframe datetime data to normalize.
There are a bunch of new string data types that cause comparison mismatch issues in testing.
The pandas comparison functions are now more strict about different NA sentinels (np.nan vs None) so had to made a lot of changes in our tests.
Setting values of a Pandas object inplace indirectly doesn't work anymore (e.g. df[df.A>3]["B"] = 3).
Many API removals and changes.

Testing strategy

Existing unit tests.

User facing changes

None.

Checklist

Pipelines passed before requesting review. To run CI you must include [run CI] in your commit message.
I am familiar with the Contributing Guide
I have installed + ran pre-commit hooks.

codecov · 2026-01-22T16:57:00Z

Codecov Report

❌ Patch coverage is 65.33333% with 104 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.45%. Comparing base (c33fbb5) to head (668318f).
⚠️ Report is 209 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1009      +/-   ##
==========================================
+ Coverage   66.68%   68.45%   +1.77%     
==========================================
  Files         186      195       +9     
  Lines       66795    68055    +1260     
  Branches     9507     9705     +198     
==========================================
+ Hits        44543    46589    +2046     
+ Misses      19572    18603     -969     
- Partials     2680     2863     +183

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

DrTodd13

Thanks!

DrTodd13 · 2026-02-16T17:46:15Z

bodo/hiframes/boxing.py

            arr_type = bodo.types.boolean_array_type

-        if arr_type == types.Array(types.NPDatetime("us"), 1, "C"):
+        # Make sure datetime64 arrays are ns


How is this making sure they are ns?

DatetimeArrayType normalizes the unit to nanosecond during unboxing (through Arrow).

DrTodd13 · 2026-02-16T17:46:51Z

bodo/hiframes/boxing.py

            arr_type = bodo.types.DatetimeArrayType(None)

+        # Make sure timedelta64 arrays are ns
+        if isinstance(arr_type, types.Array) and isinstance(


This is a specific case to timedelta.

DrTodd13 · 2026-02-16T17:49:09Z

bodo/hiframes/boxing.py

+
        # We make all Series data arrays contiguous during unboxing to avoid type errors
        # see test_df_query_stringliteral_expr
        if isinstance(arr_type, types.Array):


Maybe one check for isinstance(arr_type, types.Array) and inside that if the additional checks above with an else for this current category. Eliminate 3 checks for isinstance?

Done. Refactored this section to have only one types.Array type check.

DrTodd13 · 2026-02-16T18:03:21Z

bodo/hiframes/pd_index_ext.py

    normalize=False,
    name=None,
-    closed=None,
+    inclusive="both",


So, Pandas 3 only once this is merged?

Yes, we can't keep many versions of different APIs practically. These minor differences shouldn't matter much anyways.

DrTodd13 · 2026-02-16T18:05:01Z

bodo/hiframes/pd_rolling_ext.py

            not isinstance(on_data_type, types.Array)
            or on_data_type.dtype != bodo.types.datetime64ns
-        ):
+        ) and not on_data_type == bodo.types.DatetimeArrayType(None):


!= instead of not== ?

DrTodd13 · 2026-02-16T18:07:48Z

bodo/hiframes/pd_timestamp_ext.py

-    n_int64 = bodo.hiframes.datetime_timedelta_ext.cast_numpy_timedelta_to_int(dt64)
-    return pd.Timedelta(n_int64)
+def convert_numpy_timedelta64_to_pd_timedelta(td64):  # pragma: no cover
+    return td64


Only needs conversion in jitted code?

Not called in non-jitted code. This is just a placeholder.

Should throw an exception then?

Probably, we have a lot of these in the code base. Not worth going through them right now I think.

DrTodd13 · 2026-02-16T18:12:16Z

bodo/hiframes/series_dt_impl.py

                func_text += "        out_arr[i] = ts." + field + "\n"
        else:
-            func_text += f"        out_arr[i] = arr[i].{field}\n"
+            call_parans = "()" if field == "weekday" else ""


DrTodd13 · 2026-02-16T18:15:59Z

bodo/hiframes/series_impl.py

        func_text += "    min_val = bodo.libs.array_ops.array_op_min(arr)\n"
        func_text += "    max_val = bodo.libs.array_ops.array_op_max(arr)\n"
-        if dtype == bodo.types.datetime64ns:
+        if dtype == bodo.types.datetime64ns or isinstance(


one isinstance with two possible targets?

There is a line of code in between.

DrTodd13 · 2026-02-16T18:43:54Z

bodo/pandas_compat.py

+        if isinstance(values, (pa.Array, pa.ChunkedArray)) and (
+            pa.types.is_string(values.type) or _is_string_view(values.type)
+            # Bodo change: allow dictionary-encoded string arrays
+            # or (


Remove dead code?

Keeping the disabled code around for documentation and context to help with later upgrades.

DrTodd13 · 2026-02-16T18:47:37Z

pixi.toml

@@ -239,6 +246,7 @@ nccl = ">=2.18"
 numba = ">=0.60,<0.62.0"
 pyarrow = "21.0.*"
 libarrow = "21.0.*"
+pandas = ">=2.2.0"


What are the implications of 2.2 here?

I think it's for compatibility with older pyarrow dependency in the GPU env?

This is just moved here in this PR.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

scott-routledge2

Thanks @ehsantn LGTM!

scott-routledge2 · 2026-02-16T19:25:13Z

bodo/tests/test_df_lib/test_frontend.py

    assert generated_ctes == 1


+@pytest.mark.jit_dependency


Why does this test require JIT?

It has apply in it. I don't know how it worked before.

scott-routledge2 · 2026-02-16T19:27:32Z

bodo/tests/test_iceberg/test_iceberg_schema_evolution.py



-@pytest.mark.parametrize("filter", ["IS_NULL", "IS_NOT_NULL", "IS_IN"])
+# TODO: fix Pandas 3 issues with IS_NULL and IS_NOT_NULL


Open a followup issue?

scott-routledge2 · 2026-02-16T19:32:37Z

bodo/tests/test_parquet_read.py

 def test_pq_read_types(fname, datapath, memory_leak_check):
    def test_impl(fname):
-        return pd.read_parquet(fname)
+        return pd.read_parquet(fname, dtype_backend="pyarrow")


Do we need to update our docs/examples to reflect changes to parameters like requiring dtype_backend="pyarrow" in read csv/parquet calls?

We don't require this parameter. This is just for testing to make sure data types match and we don't run into unnecessary issues.

scott-routledge2 · 2026-02-16T20:04:57Z

bodo/tests/utils.py

@@ -1641,10 +1767,12 @@ def _test_equal(
            reset_index,
        )
    elif py_out is pd.NaT:
-        assert py_out is bodo_out
+        # TODO: return pd.NaT for pd.to_datetime(None) and pd.to_timedelta(ts_val)


Followup issue?

scott-routledge2 · 2026-02-16T20:11:57Z

pixi.toml

@@ -239,6 +246,7 @@ nccl = ">=2.18"
 numba = ">=0.60,<0.62.0"
 pyarrow = "21.0.*"
 libarrow = "21.0.*"
+pandas = ">=2.2.0"


I think it's for compatibility with older pyarrow dependency in the GPU env?

ehsantn changed the title ~~Support Pandas 3rc2~~ Support Pandas 3 Jan 22, 2026

ehsantn added 28 commits February 5, 2026 12:31

skip test_series_astype_str[decimal]

c25a8bf

fix test_series_to_numpy

aaacbe0

add datetime array datetime.datetime setitem

2f69146

skip dt64/td64 series setitem tests

6fc41e0

fix test_series_iloc_setitem_datetime_scalar

67c5b58

skip uint8 setitem

4622441

skip slice setitem not allowed in Pandas 3

91adffd

skip uint8 setitem not allowed in Pandas 3

39ad894

skip test_series_setitem_int

6cc0be9

Support datetime64 sub

d52559c

fix test_series_getitem_slice

9d4440d

skip test_series_getitem_list_int

e04874c

fix test_series_getitem_array_bool

ca69e70

skip pow float64

14d59d6

update test_series_init_dict_kwd

065bf1f

fix test_series_getitem_str_grpby_apply

a2c77c1

support Series.idxmin(timedelta)

71799d1

support Series.idxmax(timedelta)

32621bc

fix test_series_nlargest for dt64

026f241

fix test_series_take

afcc665

fix test_series_argsort

3449ae8

fix test_series_unique

7c44e04

support Series.describe() for datetime array

4c18caf

remove Series.pad() test

f014581

fix bfill/ffill for datetime

97913fd

fix bfill/ffill for timedelta

2180a5f

fix dtype check

d2aafc5

fix test_series_pct_change

8af113e

ehsantn added 14 commits February 15, 2026 21:39

fix test_decimal_window_sum

57c34c0

fix test_get_path

39b6eda

skip test_lead_lag_defaults case

a682edb

fix test_get_path

1f9f370

[run CI]

dc18de2

fix dt quantile distribution

4337af6

fix test_csv_chunksize_nrows

ad148b1

fix na tests for indexvalue

b0ea543

fix parallel idxmin/idxmax for datetime64

78b2d35

skip test_time_array_setitem_none

8fccd7d

skip test_sort_table_for_interval_join

8f6c332

fix test_comparison_operators_within_table

23be34d

fix test_get_path[int-scalar]

4f53716

[run CI]

61f0536

ehsantn requested review from DrTodd13 and scott-routledge2 February 16, 2026 17:13

DrTodd13 requested a review from Copilot February 16, 2026 17:40

Copilot AI reviewed Feb 16, 2026

View reviewed changes

DrTodd13 approved these changes Feb 16, 2026

View reviewed changes

DrTodd13 requested a review from Copilot February 16, 2026 18:48

Copilot AI reviewed Feb 16, 2026

View reviewed changes

scott-routledge2 approved these changes Feb 16, 2026

View reviewed changes

ehsantn added 6 commits February 17, 2026 10:23

fix comparison test lambdas

e9d1d39

refactor numpy array unboxing handling

563f586

refactor comparison

fcdd1a2

fix typo

a92bf16

review comments

ebc5320

[run CI]

668318f

ehsantn merged commit 5768abf into main Feb 17, 2026
35 of 50 checks passed

ehsantn deleted the ehsan/pd_3_rc2 branch February 17, 2026 16:29



		@pytest.mark.parametrize("filter", ["IS_NULL", "IS_NOT_NULL", "IS_IN"])
		# TODO: fix Pandas 3 issues with IS_NULL and IS_NOT_NULL

Conversation

ehsantn commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes included in this PR

Testing strategy

User facing changes

Checklist

Uh oh!

codecov bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

DrTodd13 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

scott-routledge2 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ehsantn commented Jan 22, 2026 •

edited

Loading

codecov bot commented Jan 22, 2026 •

edited

Loading