Releases: mllam/mllam-data-prep
v0.7.0
v0.6.1
This release contains bugfixes to update tests to use newer version of pre-commit, use correct python version, and remove uses of incompatible typing notation.
Fixes
- use old union typing notation compatible with all required python versions #77 @SimonKamuk
Maintenance
- update pre-commit action to v3.0.1 #77 @SimonKamuk
- fix tests to use expected python version from test matrix #77 @SimonKamuk
v0.6.0
This release adds the ability to slice input data by any coordinate, derive variables from input datasets, and store config in created datasets. It also adds support for zarr 3.0.0 and above, and a mypy typing action to pre-commit hooks. In addition a number of bugs were fixed related to adding unwanted dimensions to the dataset, chunk size estimates, and derived functions. The release also includes a number of maintenance updates including updating the DANRA test dataset to v0.2.0 (which smaller, leading to faster test execution) and updating the dataclass-wizard dependency to at least v0.29.2.
Added
- add functionality to slice input data by any coordinate #55@matschreiner
- add ability to derive variables from input datasets #34, @ealerskans, @mafdmi
- add github PR template to guide development process on github #44, @leifdenby
- add support for zarr 3.0.0 and above #51, @kashif
- warn if the user tries to load a non-YAML file #50, @j6k4m8
- add mypy typing action to pre-commit hooks #67, @observingClouds
- add support for storing config in created datasets and option to only overwrite zarr dataset of config change #64, @leifdenby
Fixes
- fix bug which adds unwanted dimensions to the dataset #60, @ealerskans, @observingClouds
- correct chunk size estimate #59, @ealerskans
- fix bug arising when variables provided to derived functions are renamed #56, @leifdenby
- ensure config fields defaulting to
Noneare typed asOptionaland fields defaulting to{}are given a default-factory so that serialization with default values works correctly #63, @leifdenby - fix reading of exported config files #67, @observingClouds
Maintenance
- update DANRA test dataset to v0.2.0 which uses a smaller cropped domain #62, @leifdenby
- update
dataclass-wizarddependency to at least v0.29.2 allowing for use ofUniontypes together with check for unmatched keys in config yaml #73, @leifdenby
v0.5.0
This release adds support for an optional extra section in the config file (for user-defined extra information that is ignored by mllam-data-prep) and fixes a few minor issues. Note that to use extra section in the config file the schema version in the config file must be increased to v0.5.0.
Added
- Add optional section called
extrato config file to allow for user-defined extra information that is ignored bymllam-data-prepbut can be used by downstream applications., @leifdenby
Changed
- remove f-string from
name_formatin config examples #35 - replace global config for
dataclass_wizardonmllam_data_prep.config.Configwith config specific to that dataclass (to avoid conflicts with other uses ofdataclass_wizard) #36 - Schema version bumped to
v0.5.0to match release version that supports optionalextrasection in config #18
v0.4.0
This release adds support for defining the output path in the command line interface and addresses bugs around optional dependencies for dask.distributed.
Added
Changed
v0.3.0
v0.2.0
Added
-
add support for creating dataset splits (e.g. train, validation, test) through
output.splittingsection in the config file, and support for optionally compute statistics for a given split (withoutput.splitting.splits.{split_name}.compute_statistics)..
-
include
unitsandlong_nameattributes for all stacked variables as{output_variable}_unitsand{output_variable}_long_name.
Changed
-
split dataset creation and storage to zarr into separate functions
mllam_data_prep.create_dataset(...)and
mllam_data_prep.create_dataset_zarr(...)respectively -
changes to spec from v0.1.0:
- the
architecturesection has been renamedoutputto make it clearer that this section defines the properties of the output ofmllam-data-prep sampling_dimremoved fromoutput(previouslyarchitecture) section of spec, this is not needed to create the training data- the variables (and their dimensions) of the output definition has been renamed from
architecture.input_variablestooutput.variables - coordinate value ranges for the dimensions of the output (i.e. what that the architecture expects as input) has been renamed from
architecture.input_rangestooutput.coord_rangesto make the use more clear - selection on variable coordinates values is now set with
inputs.{dataset_name}.variables.{variable_name}.valuesrather thaninputs.{dataset_name}.variables.{variable_name}.sel - when dimension-mapping method
stack_variables_by_var_nameis used the formatting string for the new variable is now calledname_formatrather thanname - when dimension-mapping is done by simply renaming a dimension this configuration now needs to be set by providing the named method (
rename) explicitly through themethodkey, i.e. rather than{to_dim}: {from_dim}it is now{to_dim}: {method: rename, dim: {from_dim}}to match the signature of the other dimension-mapping methods. - attribute
inputs.{dataset_name}.nameattribute has been removed, with the keydataset_namethis is superfluous
- the
-
relax minimuim python version requirement to
>3.8to simplify downstream usage
v0.1.0
First tagged release of mllam-data-prep which includes functionality to declaratively (in a yaml-config file) describe how the variables and
coordinates of a set of zarr-based source datasets are mapped to a new set of variables with new coordinates to single a training dataset and write this resulting single dataset to a new zarr dataset. This explicit mapping gives the flexibility to target different different model architectures (which may require different inputs with different shapes between architectures).