Conversation
|
I wonder if we should mention the ability to execute an expensive plan resulting in little data that we know will be reused later? |
| bodo.pandas {#bodopandas} | ||
| =========== | ||
|
|
||
| Bodo.pandas is an optimized and distributed dataframe library that is a |
There was a problem hiding this comment.
I think this page was unreachable. Is there any info here we want to add to another page/section?
There was a problem hiding this comment.
The #'s in the link were leading to the wrong page.
Is this referring to CTE? I think that would be nice to have. Do you have a good example? |
We can detect and optimize CTE within one query. The issue is when you have repeated computations across queries and so our CTE optimization can do nothing in that case. This is the reason for persist in dask. So, something like this: |
ehsantn
left a comment
There was a problem hiding this comment.
Thanks @scott-routledge2! This is a big improvement. See minor comments below.
DrTodd13
left a comment
There was a problem hiding this comment.
Thanks for the detailed work. Looks good.
|
|
||
| `pd.read_parquet` and `pd.read_iceberg` are lazy APIs, meaning that no actual data is read until needed in a subsequent operation. | ||
|
|
||
| You can also create BodoDataFrames from a Pandas DataFrame using the `from_pandas` function, which is useful when working with third party libraries that return Pandas DataFrames. |
There was a problem hiding this comment.
I tend to do bodo.pandas.DataFrame(pandas_dataframe) for conversion. Do we want to mention this option?
There was a problem hiding this comment.
I thought bodo.pandas.DataFrame wasn't stable but if it is just a wrapper around from_pandas then maybe we should. @ehsantn thoughts?
There was a problem hiding this comment.
It might be simpler to just use bodo.pandas.DataFrame in some of the examples that construct dataframes, so I'll include it.
There was a problem hiding this comment.
It just calls from_pandas so whichever looks better to you should be fine.
Changes included in this PR
Adds Bodo DataFrames Guide to docs/guides page and notebook to examples/#Tutorials
Refactors guides to be in docs/guides folders. Moves all JIT guides to a subdirectory of guides, fixes links etc.
The important changes are in docs/docs/guides/dataframes/dataframes_intro.md and mkdocs.yaml for the updated side bar.
Testing strategy
Ran notebook in pixi env
User facing changes
Docs
Checklist
[run CI]in your commit message.