Skip to content

Support DataTree for organizing Datasets by type of level #327

@jthielen

Description

@jthielen

As discussed in xarray-contrib/datatree#195, it would be wonderful (and relatively straightforward) to add support for DataTree in cfgrib. This would allow a improved organization of the different datasets that would have been previously been returned from cfgrib.open_datasets() in a single data collection.

As far as implementation, I would propose refactoring the existing open_datasets() to something like:

def open_datatree(path, backend_kwargs={}, **kwargs):
    # type: (str, T.Dict[str, T.Any], T.Any) -> datatree.DataTree
    """
    Open a GRIB file groupping incompatible hypercubes to different datasets via simple heuristics.
    """
    squeeze = backend_kwargs.get("squeeze", True)
    backend_kwargs = backend_kwargs.copy()
    backend_kwargs["squeeze"] = False
    datasets = open_variable_datasets(path, backend_kwargs=backend_kwargs, **kwargs)

    type_of_level_datasets = {}  # type: T.Dict[str, T.List[xr.Dataset]]
    for ds in datasets:
        for _, da in ds.data_vars.items():
            type_of_level = da.attrs.get("GRIB_typeOfLevel", "undef")
            type_of_level_datasets.setdefault(type_of_level, []).append(ds)

    return datatree.DataTree.from_dict(type_of_level_datasets)

Then, open_datasets could be re-implemented something like:

def open_datasets(path, backend_kwargs={}, **kwargs):
    type_of_level_datasets = open_datatree(path, backend_kwargs=backend_kwargs, **kwargs)
    merged = []  # type: T.List[xr.Dataset]
    for type_of_level in sorted(type_of_level_datasets):
        for ds in merge_datasets(type_of_level_datasets[type_of_level], join="exact"):
            merged.append(ds.squeeze() if squeeze else ds)
    return merged

(these snippets were edited quick in-between conference sessions; no guarantee that I didn't miss something and these don't work properly as-is)

This all being said, discussions would likely need to happen to decide whether this should be supported before or after integration of DataTree into xarray proper (xref pydata/xarray#7418).

cc @TomNicholas, @blaylockbk

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions