Skip to content

Add support for cropping domain with padded lat/lon convex hull#45

Merged
leifdenby merged 36 commits intomllam:mainfrom
leifdenby:feat/crop-with-other-dataset
Jan 13, 2026
Merged

Add support for cropping domain with padded lat/lon convex hull#45
leifdenby merged 36 commits intomllam:mainfrom
leifdenby:feat/crop-with-other-dataset

Conversation

@leifdenby
Copy link
Member

Describe your changes

The aim of this PR is to add support for cropping one dataset using the convex hull of lat/lon coordinates of grid-points in another dataset.

This is motivated by the need when doing LAM (limited area) based forecasting to be able to define forcing input in a padding region around the limited area. This could for example be when training on DANRA reanalysis one might want to force on the boundary with ERA5 reanalysis. This work builds on @TomasLandelius proof-of-concept implementation (mllam/weather-model-graphs#30) which takes the convex hull of the lat/lon coordinates of grid-points in the limited area (e.g. DANRA) domain, splits the points in the boundary domain by whether they are interior or external to this convex hull (again using the lat/lon coordinates) and defines points in the boundary region as those that are within a given distance (in radians from Earth's center) to the closest point within the convex hull.

So far I have taken the implementation from @TomasLandelius and structured into a notebook (and used DANRA and ERA5 zarr datasets in cloud storage).

Below I have made an example where ERA5 sample points are selected that are within 0.2 radians of the convex hull of grid-points in DANRA:

output

For the lat/lon convex hull calculations the spherical-geometry package is required. For plotting of course matplotlib and cartopy. I am thinking of putting the visualiation related packages in an optional [visualisation] extra requirements, but I am not sure if it makes sense to always require the spherical-geometry package.

Issue Link

Implements mllam/weather-model-graphs#30

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the documentation to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging)

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • author has added an entry to the changelog (and designated the change as added, changed or fixed)
  • Once the PR is ready to be merged, squash commits and merge the PR.

@joeloskarsson joeloskarsson added this to the v0.7.0 (proposed) milestone Dec 10, 2024
@leifdenby
Copy link
Member Author

This is getting there. Added some tests now so I can ensure the different steps of the algorithm works. Here's a nice plot for the convex-hull interior/exterior mask:

billede

Separating out the mask calculation will also make it possible to add a cropping method that just crops using inside/outside the convex hull in future

@TomasLandelius
Copy link

TomasLandelius commented Dec 10, 2024 via email

@joeloskarsson
Copy link
Contributor

This is looking great! What's your idea for how to specify the dataset to base the mask on (i.e. where to get the coordinates from the interior region from)? Would that be an already processed mdp zarr, or any arbitrary zarr? Should the path to that be specified in the mdp config file file the boundary dataset?

@leifdenby
Copy link
Member Author

leifdenby commented Dec 12, 2024

This is looking great! What's your idea for how to specify the dataset to base the mask on (i.e. where to get the coordinates from the interior region from)? Would that be an already processed mdp zarr, or any arbitrary zarr? Should the path to that be specified in the mdp config file file the boundary dataset?

Thank you :)

What I have done so far is adapt the output part of the config to make it possible to point to the path of a config for a dataset that should be treated as the interior domain. Like so:

output:
  ...
  domain_cropping:
    margin_width_degrees: 0.2
    interior_dataset_config_path: example.danra.yaml

https://github.com/mllam/mllam-data-prep/pull/45/files#diff-33f36098ed98a22bbf317f860675cc821a3ccbb56e1650f087c8cc4b3b245038R30

I adapted @TomasLandelius code so that one can define the angle in degrees instead of radians (since I think that is a bit more intuitive to reason about). And I've left space for the possibility for other cropping methods to be defined later (for example one could use the convex hull to make two overlapping domains). I am very open to comments on the changes above though :)

Also, the test for DANRA + weatherbench ERA5 currently faisl because #33 isn't complete yet. I think it would be get that one in first. But other than that it is complete 🥳

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is great, however it is mainly a documentation of the implementation of domain-cropping, rather than a documentation of how to do it in practice (for a user). It is still good to have, but I think this might be good to clarify in here, that the user should not do something like what's in the file, but actually just specify a line in the config.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. I've mostly left this in here because I was using it during development and I thought the examples might be useful later. Maybe calling it documentation isn't quite right. In general though it would be nice with some documentation on how to use mllam_data_prep not from the command line but by importing the package. I haven't had a chance to write that yet, but this notebook was supposed to be a start along those lines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document also the new config fields in the README

@TomasLandelius
Copy link

TomasLandelius commented Jan 9, 2025 via email

@TomasLandelius
Copy link

TomasLandelius commented Jan 9, 2025 via email

@joeloskarsson
Copy link
Contributor

My changes are all merged in here already @observingClouds :) It was the ideas resulting from this discussion #45 (comment), which pretty much resulted in https://github.com/leifdenby/mllam-data-prep/blob/e9244f935415892d353ca4323fe2b3f8a482ce98/mllam_data_prep/ops/cropping.py#L128-L194. However, I didn't manage to parallelize it over the arcs that make up the convex hull, so now it does https://github.com/leifdenby/mllam-data-prep/blob/e9244f935415892d353ca4323fe2b3f8a482ce98/mllam_data_prep/ops/cropping.py#L264-L270 sadly. While there are typically not that many such arcs, there could be in some cases and this is definitely paralellizable.

@joeloskarsson
Copy link
Contributor

Read over this again today, and I think my last comment here summarizes things well. We could speed things up by paralellizing over the arcs of the polygon. I think this might be very slow if you end up with many arcs in your convex hull of interior points.

I am quite sure there might be algorithmical improvement possible as well, but that is probably a quite involved exercise to dive into. Although it is a fun problem in computational spherical geometry 😄

Copy link
Contributor

@observingClouds observingClouds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I checked now my what I believe is a more performant version of the cropping method as it uses shapely instead of https://github.com/spacetelescope/spherical_geometry and vectorises the "does cull contain point X" question.

I believe the current implementation leads to correct results though. So from that perspective I give a 🟢

@TomasLandelius
Copy link

TomasLandelius commented Sep 22, 2025 via email

@observingClouds
Copy link
Contributor

Are you sure about the workings of shapely in 3D? I read that "The return value is a strictly two-dimensional geometry. All Z coordinates of the original geometry will be ignored." /Tomas

Good catch @TomasLandelius. Shapely indeed seem to be able to create 3D polygons but functions like contains ignore the third dimension (silently). Thanks for pointing it out.

shapely does not account for z coordinate
@observingClouds
Copy link
Contributor

We could speed things up by paralellizing over the arcs of the polygon. I think this might be very slow if you end up with many arcs in your convex hull of interior points.

@joeloskarsson I can confirm that both of your points are valid. I encountered quite a slow-down with over 1000 arcs and was able to vectorize the function and drop compute time significantly. Based on the last dev meeting we will do such optimizations in a separate PR once this is merged, right?

@joeloskarsson
Copy link
Contributor

Agreed, and thanks for opening #83 issue to track that optimization! If every issue from above was resolved this should be good to go. There is only #45 (comment) that is not resolved, but I think maybe that is already fixed?

@TomasLandelius
Copy link

TomasLandelius commented Sep 27, 2025 via email

@leifdenby
Copy link
Member Author

leifdenby commented Oct 29, 2025

Are we in agreement that this should be merged in as-is @observingClouds and @joeloskarsson? If so, I think all that is left is 1) add a changelog and 2) I will add a README.md in docs/ explaining that that directory contains work-in-progress notebooks that will eventually for documentation for using the python api of mllam-data-prep (as in contrast to using the command line interface instead). Sound ok?

@observingClouds
Copy link
Contributor

LGTM from my side.

@joeloskarsson
Copy link
Contributor

#45 (comment) still looks unresolved to me, please have a look at that thread. Otherwise I agree, after changelog + a docs readme this should be good to go 😄

@leifdenby
Copy link
Member Author

leifdenby commented Nov 3, 2025

#45 (comment) still looks unresolved to me, please have a look at that thread. Otherwise I agree, after changelog + a docs readme this should be good to go 😄

you're right, sorry! I loose track of things in these comments on github sometimes. So full TODO is:

@leifdenby
Copy link
Member Author

Ok, apart from the discussion on #45 (comment) this is ready to go now :) Just need to see if what I've done to resolve that discussion makes sense for you.

@joeloskarsson
Copy link
Contributor

#45 (comment) is fixed now yes. I just realized that there might be a problem with the package requirements when testing that (see above). But after sorting that out I agree that this is ready to be merged.

@leifdenby leifdenby merged commit 18c3779 into mllam:main Jan 13, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants