Skip to content

Conversation

@dhruvak001
Copy link
Contributor

@dhruvak001 dhruvak001 commented Dec 3, 2025

Support [create_index: bool] in [to_dataframe()] to skip creating MultiIndex.

This PR adds a new create_index parameter to both Dataset.to_dataframe() and DataArray.to_dataframe() methods, allowing users to skip the potentially expensive MultiIndex creation and use a simple RangeIndex instead.

Copilot AI review requested due to automatic review settings December 3, 2025 21:50
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a create_index parameter to both Dataset.to_dataframe() and DataArray.to_dataframe() methods, allowing users to bypass the potentially expensive MultiIndex creation and use a simple RangeIndex instead. This is a performance optimization feature that addresses issue #10912.

  • Adds optional create_index: bool = True parameter to maintain backward compatibility
  • When create_index=False, uses pd.RangeIndex instead of constructing a MultiIndex from coordinates
  • Preserves data integrity and ordering while avoiding the MultiIndex overhead

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
xarray/core/dataset.py Adds create_index parameter to to_dataframe() and _to_dataframe() methods, implementing RangeIndex creation when parameter is False
xarray/core/dataarray.py Adds create_index parameter to to_dataframe() method and passes it through to Dataset's _to_dataframe()
xarray/tests/test_dataset.py Adds comprehensive tests for the new create_index parameter on Dataset, including default behavior, RangeIndex creation, data integrity, and interaction with dim_order
xarray/tests/test_dataarray.py Adds comprehensive tests for the new create_index parameter on DataArray, including default behavior, RangeIndex creation, data integrity, and interaction with additional coordinates
Comments suppressed due to low confidence (1)

xarray/core/dataset.py:7277

  • The docstring states "The DataFrame is indexed by the Cartesian product of this dataset's indices" but this is no longer true when create_index=False. Consider updating the first paragraph of the docstring to clarify this behavior, e.g., "When create_index=True (default), the DataFrame is indexed by the Cartesian product of this dataset's indices. When create_index=False, a simple RangeIndex is used instead."
        Non-index variables in this dataset form the columns of the
        DataFrame. The DataFrame is indexed by the Cartesian product of
        this dataset's indices.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, can you add a note to whats-new.rst please

@dcherian dcherian changed the title Support create_index: bool in to_dataframe() to skip creating MultiIndex Support create_index: bool in to_dataframe() to skip creating MultiIndex Dec 4, 2025
Copy link
Member

@benbovy benbovy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dhruvak001. I just left one small comment on the docstrings.

@dcherian
Copy link
Contributor

dcherian commented Dec 4, 2025

Can you make this change for to_dask_dataframe too please?

DHRUVA KUMAR KAUSHAL added 2 commits December 28, 2025 00:25
@dhruvak001
Copy link
Contributor Author

Was busy from somedays, have resolved all comments. Kindly review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support create_index: bool in to_dataframe to skip creating MultiIndex

5 participants