Refactor Guides Docs by scott-routledge2 · Pull Request #882 · bodo-ai/Bodo

scott-routledge2 · 2025-10-16T16:53:58Z

Changes included in this PR

Adds Bodo DataFrames Guide to docs/guides page and notebook to examples/#Tutorials

Refactors guides to be in docs/guides folders. Moves all JIT guides to a subdirectory of guides, fixes links etc.

The important changes are in docs/docs/guides/dataframes/dataframes_intro.md and mkdocs.yaml for the updated side bar.

Testing strategy

Ran notebook in pixi env

User facing changes

Docs

Checklist

Pipelines passed before requesting review. To run CI you must include [run CI] in your commit message.
I am familiar with the Contributing Guide
I have installed + ran pre-commit hooks.

docs/docs/guides/dataframes/dataframes_intro.md

…ai/Bodo into scott/refactor_docs_guides

DrTodd13 · 2025-10-17T17:15:12Z

I wonder if we should mention the ability to execute an expensive plan resulting in little data that we know will be reused later?

scott-routledge2 · 2025-10-17T17:39:09Z

docs/docs/dataframe_library/index.md

-bodo.pandas {#bodopandas}
-===========
-
-Bodo.pandas is an optimized and distributed dataframe library that is a


I think this page was unreachable. Is there any info here we want to add to another page/section?

scott-routledge2 · 2025-10-17T18:06:20Z

examples/_Getting-Started/dataframes.ipynb

The #'s in the link were leading to the wrong page.

scott-routledge2 · 2025-10-17T18:10:59Z

I wonder if we should mention the ability to execute an expensive plan resulting in little data that we know will be reused later?

Is this referring to CTE? I think that would be nice to have. Do you have a good example?

DrTodd13 · 2025-10-17T18:58:18Z

I wonder if we should mention the ability to execute an expensive plan resulting in little data that we know will be reused later?

Is this referring to CTE? I think that would be nice to have. Do you have a good example?

We can detect and optimize CTE within one query. The issue is when you have repeated computations across queries and so our CTE optimization can do nothing in that case. This is the reason for persist in dask. So, something like this:

import bodo.pandas as bd

df = bd.read_parquet("bigdata.parquet")

# Expensive transformation (e.g. multi-join + groupby)
expensive = df.merge(other, on="key").groupby("category").agg({"value": "sum"})

expensive.execute_plan()

# Without execute_plan above, this would trigger execution of expensive.
print(expensive + 7)

# Without execute_plan above, this would also trigger a second execution of expensive.
result = print(some_other_computation(expensive[expensive["value"] > 1000]))

ehsantn

Thanks @scott-routledge2! This is a big improvement. See minor comments below.

docs/mkdocs.yml

examples/_Tutorials/dataframes_intro.ipynb

docs/docs/guides/dataframes/dataframes_intro.md

examples/_Tutorials/dataframes_intro.ipynb

DrTodd13

Thanks for the detailed work. Looks good.

DrTodd13 · 2025-10-20T16:41:19Z

docs/docs/guides/dataframes/dataframes_intro.md

+
+`pd.read_parquet` and `pd.read_iceberg` are lazy APIs, meaning that no actual data is read until needed in a subsequent operation.
+
+You can also create BodoDataFrames from a Pandas DataFrame using the `from_pandas` function, which is useful when working with third party libraries that return Pandas DataFrames.


I tend to do bodo.pandas.DataFrame(pandas_dataframe) for conversion. Do we want to mention this option?

I thought bodo.pandas.DataFrame wasn't stable but if it is just a wrapper around from_pandas then maybe we should. @ehsantn thoughts?

It might be simpler to just use bodo.pandas.DataFrame in some of the examples that construct dataframes, so I'll include it.

It just calls from_pandas so whichever looks better to you should be fine.

scott-routledge2 added 7 commits October 14, 2025 16:25

add links and JIT warmup cell

5f1d293

Add Bodo DataFrames guide tutorial notebook

ce282ad

run all cells

48829e5

Reorganize guides in docs

6fde447

Add as notebook

ec76329

fix formatting

ce898c3

fix title

f827682

scott-routledge2 commented Oct 16, 2025

View reviewed changes

docs/docs/guides/dataframes/dataframes_intro.md Show resolved Hide resolved

scott-routledge2 and others added 5 commits October 16, 2025 14:12

clarify JIT fallback is not Pandas fallback

74e442e

[run CI]

da1d795

fix links

77664cc

refactor integration guides

bd35267

Merge branch 'scott/refactor_docs_guides' of https://github.com/bodo-…

dfd8c5f

…ai/Bodo into scott/refactor_docs_guides

scott-routledge2 commented Oct 17, 2025

View reviewed changes

fix more links and small changes

a61b9e4

scott-routledge2 commented Oct 17, 2025

View reviewed changes

examples/_Getting-Started/dataframes.ipynb

Copy link

Contributor Author

scott-routledge2 Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The #'s in the link were leading to the wrong page.

scott-routledge2 requested review from DrTodd13 and ehsantn October 17, 2025 18:11

scott-routledge2 marked this pull request as ready for review October 17, 2025 18:11

scott-routledge2 added 2 commits October 17, 2025 17:15

add execute_plan section

18f6e5d

remove warnings

3f28055

ehsantn approved these changes Oct 20, 2025

View reviewed changes

docs/mkdocs.yml Outdated Show resolved Hide resolved

examples/_Tutorials/dataframes_intro.ipynb Outdated Show resolved Hide resolved

docs/docs/guides/dataframes/dataframes_intro.md Show resolved Hide resolved

examples/_Tutorials/dataframes_intro.ipynb Outdated Show resolved Hide resolved

add script, address minor comments

6802d4e

DrTodd13 approved these changes Oct 20, 2025

View reviewed changes

scott-routledge2 added 3 commits October 20, 2025 16:01

use bodo.pandas.DataFrame instead of from_pandas

54162b4

fix grammar

73dfc73

[skip]

905f7f2

scott-routledge2 added 2 commits October 20, 2025 16:13

insert 'can' [skip]

6792de0

insert 'can' [skip]

7a6cfc5

scott-routledge2 merged commit dd2dcbb into main Oct 20, 2025
13 checks passed

scott-routledge2 deleted the scott/refactor_docs_guides branch October 20, 2025 21:26


		`pd.read_parquet` and `pd.read_iceberg` are lazy APIs, meaning that no actual data is read until needed in a subsequent operation.

		You can also create BodoDataFrames from a Pandas DataFrame using the `from_pandas` function, which is useful when working with third party libraries that return Pandas DataFrames.

Conversation

scott-routledge2 commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes included in this PR

Testing strategy

User facing changes

Checklist

Uh oh!

Uh oh!

DrTodd13 commented Oct 17, 2025

Uh oh!

scott-routledge2 Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

scott-routledge2 Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

scott-routledge2 commented Oct 17, 2025

Uh oh!

DrTodd13 commented Oct 17, 2025

Uh oh!

ehsantn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DrTodd13 left a comment

Choose a reason for hiding this comment

Uh oh!

DrTodd13 Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

scott-routledge2 Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

scott-routledge2 Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

ehsantn Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

scott-routledge2 commented Oct 16, 2025 •

edited

Loading