-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Migration guide for users of old datatree repo #9598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
TomNicholas
merged 20 commits into
pydata:main
from
TomNicholas:datatree_migration_guide
Oct 15, 2024
Merged
Changes from 14 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
5b52e61
sketch of migration guide
TomNicholas c39b04c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 1eff8fe
whatsnew
TomNicholas 74e4f68
Merge branch 'main' into datatree_migration_guide
TomNicholas 05dcf50
add date
TomNicholas b0621ce
spell out API changes in more detail
TomNicholas 8c5f41b
details on backends integration
TomNicholas 5ce2d26
explain alignment and open_groups
TomNicholas 1e8b04e
explain coordinate inheritance
TomNicholas 9ee394e
Merge branch 'datatree_migration_guide' of https://github.com/TomNich…
TomNicholas 8b8624c
Merge branch 'main' into datatree_migration_guide
TomNicholas b8eeaaf
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 366e755
re-trigger CI
TomNicholas 1c751dd
remove bullet about map_over_subtree
TomNicholas 68b2ab6
Markdown formatting for important warning block
TomNicholas 7d560cd
Reorder changes in order of importance
TomNicholas 23d0e44
Clearer wording on setting relationships
TomNicholas 85d9c99
remove "technically"
TomNicholas 41791ec
Merge branch 'main' into datatree_migration_guide
TomNicholas 5c47583
Merge branch 'main' into datatree_migration_guide
TomNicholas File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
| # Migration guide for users of `xarray-contrib/datatree` | ||
|
|
||
| _15th October 2024_ | ||
|
|
||
| This guide is for previous users of the prototype `datatree.DataTree` class in the `xarray-contrib/datatree repository`. That repository has now been archived, and will not be maintained. This guide is intended to help smooth your transition to using the new, updated `xarray.DataTree` class. | ||
|
|
||
| .. important | ||
|
|
||
| There are breaking changes! You should not expect that code written with `xarray-contrib/datatree` will work without any modifications. | ||
| At the absolute minimum you will need to change the top-level import statement, but there are other changes too. | ||
|
|
||
| We have made various changes compared to the prototype version. These can be split into three categories: minor API changes, which mostly consist of renaming methods to be more self-consistent; and some deeper data model changes, which affect the hierarchal structure itself; and integration with xarray's IO backends. | ||
TomNicholas marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ### Data model changes | ||
|
|
||
| The most important changes made are to the data model of `DataTree`. Whilst previously data in different nodes was unrelated and therefore unconstrained, now trees have "internal alignment" - meaning that dimensions and indexes in child nodes must exactly align with those in their parents. | ||
|
|
||
| These alignment checks happen at tree construction time, meaning technically there are some netCDF4 files and zarr stores that previously could be opened as `datatree.DataTree` objects using `datatree.open_datatree`, but now cannot be opened as `xr.DataTree` objects using `xr.open_datatree`. For these cases we added a new opener function `xr.open_groups`, which returns a `dict[str, Dataset]`. This is intended as a fallback for tricky cases, where the idea is that you can still open the entire contents of the file using `open_groups`, edit the `Dataset` objects, then construct a valid tree from the edited dictionary using `DataTree.from_dict`. | ||
TomNicholas marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| The alignment checks allowed us to add "Coordinate Inheritance", a much-requested feature where indexed coordinate variables are now "inherited" down to child nodes. This allows you to define common coordinates in a parent group that are then automatically available on every child node. The distinction between a locally-defined coordinate variables and an inherited coordinate that was defined on a parent node is reflected in the `DataTree.__repr__`. Generally if you prefer not to have these variables be inherited you can get more similar behaviour to the old `datatree` package by removing indexes from coordinates, as this prevents inheritance. | ||
|
|
||
| For further documentation see the page in the user guide on Hierarchical Data. | ||
|
|
||
| ### Integrated backends | ||
|
|
||
| Previously `datatree.open_datatree` used a different codepath from `xarray.open_dataset`, and was hard-coded to only support opening netCDF files and Zarr stores. | ||
| Now xarray's backend entrypoint system has been generalized to include `open_datatree` and the new `open_groups`. | ||
| This means we can now extend other xarray backends to support `open_datatree`! If you are the maintainer of an xarray backend we encourage you to add support for `open_datatree` and `open_groups`! | ||
|
|
||
| Additionally: | ||
| - A `group` kwarg has been added to `open_datatree` for choosing which group in the file should become the root group of the created tree. | ||
| - Various performance improvements have been made, which should help when opening netCDF files and Zarr stores with large numbers of groups. | ||
| - We anticipate further performance improvements being possible for datatree IO. | ||
|
|
||
| ### API changes | ||
|
|
||
| A number of other API changes have been made, which should only require minor modifications to your code: | ||
| - The top-level import has changed, from `from datatree import DataTree, open_datatree` to `from xarray import DataTree, open_datatree`. Alternatively you can now just use the `import xarray as xr` namespace convention for everything datatree-related. | ||
| - The `DataTree.ds` property has been changed to `DataTree.dataset`, though `DataTree.ds` remains as an alias for `DataTree.dataset`. | ||
| - Similarly the `ds` kwarg in the `DataTree.__init__` constructor has been replaced by `dataset`, i.e. use `DataTree(dataset=)` instead of `DataTree(ds=...)`. | ||
| - The method `DataTree.to_dataset()` still exists but now has different options for controlling which variables are present on the resulting `Dataset`, e.g. `inherited=True/False`. | ||
| - The `DataTree.parent` property is now read-only. To assign a node as the parent you should instead use the `.children` property on the other node, which remains settable. | ||
TomNicholas marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - Similarly the `parent` kwarg has been removed from the `DataTree.__init__` constuctor. | ||
| - DataTree objects passed to the `children` kwarg in `DataTree.__init__` are now shallow-copied. | ||
| - `DataTree.as_array` has been replaced by `DataTree.to_dataarray`. | ||
| - A number of methods which were not well tested have been (temporarily) disabled. In general we have tried to only keep things that are known to work, with the plan to increase API surface incrementally after release. | ||
|
|
||
| ## Thank you! | ||
|
|
||
| Thank you for trying out `xarray-contrib/datatree`! | ||
|
|
||
| We welcome contributions of any kind, including good ideas that never quite made it into the original datatree repository. Please also let us know if we have forgotten to mention a change that should have been listed in this guide. | ||
|
|
||
| Sincerely, the datatree team: | ||
|
|
||
| Tom Nicholas, | ||
| Owen Littlejohns, | ||
| Matt Savoie, | ||
| Eni Awowale, | ||
| Alfonso Ladino, | ||
| Justus Magin, | ||
| Stephan Hoyer | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.