Skip to content

Add ability to derive variables and add selected derived forcings#34

Merged
ealerskans merged 100 commits intomllam:mainfrom
ealerskans:feature/derive_forcings
Jan 27, 2025
Merged

Add ability to derive variables and add selected derived forcings#34
ealerskans merged 100 commits intomllam:mainfrom
ealerskans:feature/derive_forcings

Conversation

@ealerskans
Copy link
Contributor

@ealerskans ealerskans commented Nov 6, 2024

Implements the ability to derive fields from the input datasets, as discussed in Deriving forcings #29.

At the moment, I have only added the possibility to derive the following forcings:

  • top-of-atmosphere radiation
  • hour of day (cyclically encoded)
  • day of year (cyclically encoded)
  • time of year (cyclically encoded)

But additional variables, such as boundary and land-sea masks, should be added. But I think that is for another PR.

- Update the configuration file so that we list the dependencies
and the method used to calculate the derived variable instead
of having a flag to say that the variables should be derived.
This approach is temporary and might be revised soon.
- Add a new class in mllam_data_prep/config.py for derived
variables to distinguish them from non-derived variables.
- Updates to mllam_data_prep/ops/loading.py to distinguish
between derived and non-derived variables.
- Move all functions related to forcing derivations to a new
and renamed function (mllam_data_prep/ops/forcings.py).
@leifdenby leifdenby mentioned this pull request Nov 18, 2024
13 tasks
@leifdenby leifdenby modified the milestones: v0.4.0, v0.6.0 Nov 18, 2024
@ealerskans ealerskans changed the title WIP: Add selected derived forcings Add ability to derive variables and add selected derived forcings Dec 10, 2024
@joeloskarsson
Copy link
Contributor

Hi @ealerskans, just sneaking in a question about this. Me and @sadamov are eager to add on derived forcing inputs to some experiments. I see that there is still discussion ongoing above about the code structure and tests, but is all the actual functionality in place here? I.e. could we check out this branch and use it in the meantime to create the forcings we need?

@ealerskans
Copy link
Contributor Author

I have tried to update the PR now based on all the great suggestions and feedback I have gotten. What I think is missing now are

  • functionality of how to wrap a derived variable as an xr.DataArray if it is e.g. a np.ndarray
  • add more tests

With respect to tests, I think that the following would be nice to test, but I am not sure how to do it:

  • That _check_and_get_required_attributes in mllam_data_prep.ops.derive_variable.main works as it should (i.e. how we handle attributes)
  • That _get_derived_variable_function in mllam_data_prep.ops.derive_variable.main works as it should (i.e. that we get the correct function, and that it also works for functions outside of mllam_data_prep)

@mafdmi do you perhaps have any good ideas on 1) if it would be good to test them, and 2) how to do it?

Is the functionality to wrap in xr.DataArray something that should be included in this PR?

Then there is also the task of adding other derived fields, such as creating a land-sea mask or a boundary mask (https://github.com/mllam/neural-lam/blob/feature_calculate_forcings/create_forcings.py). But I guess these can be added in another PR.

@ealerskans
Copy link
Contributor Author

Hi @ealerskans, just sneaking in a question about this. Me and @sadamov are eager to add on derived forcing inputs to some experiments. I see that there is still discussion ongoing above about the code structure and tests, but is all the actual functionality in place here? I.e. could we check out this branch and use it in the meantime to create the forcings we need?

Hi @joeloskarsson and @sadamov If you are only interested in adding the datetime forcings (hour of day and day of year) and top-of-the-atmosphere radiation), then you should be able to use this branch as it is now. I believe that I at one point also tested training neural-lam on a small test dataset which included derived variables as well. However no guarantees ;)

@mfroelund
Copy link
Contributor

I have tried to update the PR now based on all the great suggestions and feedback I have gotten. What I think is missing now are

* functionality of how to wrap a derived variable as an `xr.DataArray` if it is e.g. a `np.ndarray`

* add more tests

With respect to tests, I think that the following would be nice to test, but I am not sure how to do it:

* That `_check_and_get_required_attributes` in `mllam_data_prep.ops.derive_variable.main` works as it should (i.e. how we handle attributes)

* That `_get_derived_variable_function` in `mllam_data_prep.ops.derive_variable.main` works as it should (i.e. that we get the correct function, and that it also works for functions outside of `mllam_data_prep`)

@mafdmi do you perhaps have any good ideas on 1) if it would be good to test them, and 2) how to do it?

Is the functionality to wrap in xr.DataArray something that should be included in this PR?

Then there is also the task of adding other derived fields, such as creating a land-sea mask or a boundary mask (https://github.com/mllam/neural-lam/blob/feature_calculate_forcings/create_forcings.py). But I guess these can be added in another PR.

Yes, I think we can test them. For _check_and_get_required_attributes I would test, that it returns the correct attributes depending on the different cases, e.g. if attribute is in field.attrs or not, the attrs is correctly updated. For _get_derived_variable_function we can setup tests to check that it can import various functions from the standard library, and test what happens with non-existing module. Do you want me to write a proposal for the tests or do want to give it a try from this?

@ealerskans
Copy link
Contributor Author

ealerskans commented Jan 20, 2025

Yes, I think we can test them. For _check_and_get_required_attributes I would test, that it returns the correct attributes depending on the different cases, e.g. if attribute is in field.attrs or not, the attrs is correctly updated. For _get_derived_variable_function we can setup tests to check that it can import various functions from the standard library, and test what happens with non-existing module. Do you want me to write a proposal for the tests or do want to give it a try from this?

If you have time it would be really helpful if you could give it a go. Then I can try and learn from that :)

@mfroelund
Copy link
Contributor

Yes, I think we can test them. For _check_and_get_required_attributes I would test, that it returns the correct attributes depending on the different cases, e.g. if attribute is in field.attrs or not, the attrs is correctly updated. For _get_derived_variable_function we can setup tests to check that it can import various functions from the standard library, and test what happens with non-existing module. Do you want me to write a proposal for the tests or do want to give it a try from this?

If you have time it would be really helpful if you could give it a go. Then I can try and learn from that :)

I've opened ealerskans#1 with tests added:)

mfroelund and others added 7 commits January 21, 2025 16:06
@ealerskans
Copy link
Contributor Author

@leifdenby I have implemented what we talked about in 0ecfcca.

Copy link
Contributor

@mfroelund mfroelund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!! Only minor suggestions/fixes

@sadamov
Copy link

sadamov commented Jan 26, 2025

Hi @joeloskarsson and @sadamov If you are only interested in adding the datetime forcings (hour of day and day of year) and top-of-the-atmosphere radiation), then you should be able to use this branch as it is now. I believe that I at one point also tested training neural-lam on a small test dataset which included derived variables as well. However no guarantees ;)

This is brilliant, worked like a charm for my Swiss dataset! Thank you 🙏

Copy link
Member

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have one suggestion for the docstring for that lat/lon function. Otherwise this is ready to merge 🥳

oh and the change in the README needs changing back I think :)

README.md Outdated

config_path = "example.danra.yaml"
config = mdp.Config.from_yaml_file(config_path)
config = mdp.Config.load_config(config_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this example in the README still needs changing back :)

@ealerskans ealerskans merged commit ecd535c into mllam:main Jan 27, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants