Add support for cropping domain with padded lat/lon convex hull#45
Add support for cropping domain with padded lat/lon convex hull#45leifdenby merged 36 commits intomllam:mainfrom
Conversation
|
This is getting there. Added some tests now so I can ensure the different steps of the algorithm works. Here's a nice plot for the convex-hull interior/exterior mask: Separating out the mask calculation will also make it possible to add a cropping method that just crops using inside/outside the convex hull in future |
|
Looks great!
/Tomas
…-------- Originalmeddelande --------
Från: Leif Denby ***@***.***>
Datum: 2024-12-10 20:49 (GMT+01:00)
Till: mllam/mllam-data-prep ***@***.***>
Kopia: Landelius Tomas ***@***.***>, Mention ***@***.***>
Ämne: Re: [mllam/mllam-data-prep] Add support for cropping domain with padded lat/lon convex hull (PR #45)
This is getting there. Added some tests now so I can ensure the different steps of the algorithm works. Here's a nice plot for the convex-hull interior/exterior mask:
billede.png (view on web)<https://github.com/user-attachments/assets/087309b0-7ffb-4e13-afd7-1054d01f4330>
Separating out the mask calculation will also make it possible to add a cropping method that just crops using inside/outside the convex hull in future
—
Reply to this email directly, view it on GitHub<#45 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AY7HJCI4O5ONLUB5XXVZSU32E5AVTAVCNFSM6AAAAABTJS3AC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZSG4ZDEOBUHE>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
|
This is looking great! What's your idea for how to specify the dataset to base the mask on (i.e. where to get the coordinates from the interior region from)? Would that be an already processed mdp zarr, or any arbitrary zarr? Should the path to that be specified in the mdp config file file the boundary dataset? |
Thank you :) What I have done so far is adapt the output:
...
domain_cropping:
margin_width_degrees: 0.2
interior_dataset_config_path: example.danra.yamlI adapted @TomasLandelius code so that one can define the angle in degrees instead of radians (since I think that is a bit more intuitive to reason about). And I've left space for the possibility for other cropping methods to be defined later (for example one could use the convex hull to make two overlapping domains). I am very open to comments on the changes above though :) Also, the test for DANRA + weatherbench ERA5 currently faisl because #33 isn't complete yet. I think it would be get that one in first. But other than that it is complete 🥳 |
There was a problem hiding this comment.
This file is great, however it is mainly a documentation of the implementation of domain-cropping, rather than a documentation of how to do it in practice (for a user). It is still good to have, but I think this might be good to clarify in here, that the user should not do something like what's in the file, but actually just specify a line in the config.
There was a problem hiding this comment.
Yes, good point. I've mostly left this in here because I was using it during development and I thought the examples might be useful later. Maybe calling it documentation isn't quite right. In general though it would be nice with some documentation on how to use mllam_data_prep not from the command line but by importing the package. I haven't had a chance to write that yet, but this notebook was supposed to be a start along those lines.
There was a problem hiding this comment.
Document also the new config fields in the README
|
You need to calculate the distance to the polygon, not just it's edges. It should be possible by considering all lines (great arcs) between the points making up the polygon.
-------- Originalmeddelande --------
Från: Joel Oskarsson ***@***.***>
Datum: 2025-01-09 18:08 (GMT+01:00)
Till: mllam/mllam-data-prep ***@***.***>
Kopia: Landelius Tomas ***@***.***>, Mention ***@***.***>
Ämne: Re: [mllam/mllam-data-prep] Add support for cropping domain with padded lat/lon convex hull (PR #45)
@joeloskarsson commented on this pull request.
________________________________
In mllam_data_prep/ops/cropping.py<#45 (comment)>:
+ def calc_dist(da_pt_xyz):
+ dotproduct = np.dot(da_xyz_ref, da_pt_xyz.T)
+ val = np.min(np.arccos(dotproduct))
+ return val
+
+ da_mindist_to_ref = xr.apply_ufunc(
+ calc_dist,
+ da_xyz,
+ input_core_dims=[["xyz"]],
+ output_core_dims=[[]],
+ vectorize=True,
+ )
Extracting the points that make up the convex hull polygon in spherical_geometry and computing the distance to those points does not work. For the DANRA domain as example that polygon only has points at the corners (blue), giving a weird cropping of lat-lon ERA data (red)
chull_points.png (view on web)<https://github.com/user-attachments/assets/87cecb56-b7a9-49cd-b4b9-e9a08690e7ee>
(It did speed up the cropping 10000 times though 😆). So we would need to also compute the distance to the actual edges of the polygon.
—
Reply to this email directly, view it on GitHub<#45 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AY7HJCLMLN7R7SYIQ3ER6JD2J2UKJAVCNFSM6AAAAABTJS3AC2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNBQGQ4DONJVHA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
|
Sorry, you had it right. Mixed up corners and edges...
-------- Originalmeddelande --------
Från: Landelius Tomas ***@***.***>
Datum: 2025-01-09 21:24 (GMT+01:00)
Till: mllam/mllam-data-prep ***@***.***>, mllam/mllam-data-prep ***@***.***>
Kopia: Mention ***@***.***>
Ämne: RE: [mllam/mllam-data-prep] Add support for cropping domain with padded lat/lon convex hull (PR #45)
You need to calculate the distance to the polygon, not just it's edges. It should be possible by considering all lines (great arcs) between the points making up the polygon.
-------- Originalmeddelande --------
Från: Joel Oskarsson ***@***.***>
Datum: 2025-01-09 18:08 (GMT+01:00)
Till: mllam/mllam-data-prep ***@***.***>
Kopia: Landelius Tomas ***@***.***>, Mention ***@***.***>
Ämne: Re: [mllam/mllam-data-prep] Add support for cropping domain with padded lat/lon convex hull (PR #45)
@joeloskarsson commented on this pull request.
________________________________
In mllam_data_prep/ops/cropping.py<#45 (comment)>:
+ def calc_dist(da_pt_xyz):
+ dotproduct = np.dot(da_xyz_ref, da_pt_xyz.T)
+ val = np.min(np.arccos(dotproduct))
+ return val
+
+ da_mindist_to_ref = xr.apply_ufunc(
+ calc_dist,
+ da_xyz,
+ input_core_dims=[["xyz"]],
+ output_core_dims=[[]],
+ vectorize=True,
+ )
Extracting the points that make up the convex hull polygon in spherical_geometry and computing the distance to those points does not work. For the DANRA domain as example that polygon only has points at the corners (blue), giving a weird cropping of lat-lon ERA data (red)
chull_points.png (view on web)<https://github.com/user-attachments/assets/87cecb56-b7a9-49cd-b4b9-e9a08690e7ee>
(It did speed up the cropping 10000 times though 😆). So we would need to also compute the distance to the actual edges of the polygon.
—
Reply to this email directly, view it on GitHub<#45 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AY7HJCLMLN7R7SYIQ3ER6JD2J2UKJAVCNFSM6AAAAABTJS3AC2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNBQGQ4DONJVHA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
…op-with-other-dataset
|
My changes are all merged in here already @observingClouds :) It was the ideas resulting from this discussion #45 (comment), which pretty much resulted in https://github.com/leifdenby/mllam-data-prep/blob/e9244f935415892d353ca4323fe2b3f8a482ce98/mllam_data_prep/ops/cropping.py#L128-L194. However, I didn't manage to parallelize it over the arcs that make up the convex hull, so now it does https://github.com/leifdenby/mllam-data-prep/blob/e9244f935415892d353ca4323fe2b3f8a482ce98/mllam_data_prep/ops/cropping.py#L264-L270 sadly. While there are typically not that many such arcs, there could be in some cases and this is definitely paralellizable. |
|
Read over this again today, and I think my last comment here summarizes things well. We could speed things up by paralellizing over the arcs of the polygon. I think this might be very slow if you end up with many arcs in your convex hull of interior points. I am quite sure there might be algorithmical improvement possible as well, but that is probably a quite involved exercise to dive into. Although it is a fun problem in computational spherical geometry 😄 |
…fdenby/mllam-data-prep into feat/crop-with-other-dataset
…feat/crop-with-other-dataset
observingClouds
left a comment
There was a problem hiding this comment.
Alright, I checked now my what I believe is a more performant version of the cropping method as it uses shapely instead of https://github.com/spacetelescope/spherical_geometry and vectorises the "does cull contain point X" question.
I believe the current implementation leads to correct results though. So from that perspective I give a 🟢
|
Are you sure about the workings of shapely in 3D? I read that "The return value is a strictly two-dimensional geometry. All Z coordinates of the original geometry will be ignored."
/Tomas
…________________________________________
From: Hauke Schulz ***@***.***>
Sent: Monday, September 22, 2025 10:27:32 AM
To: mllam/mllam-data-prep
Cc: Tomas Landelius; Mention
Subject: Re: [mllam/mllam-data-prep] Add support for cropping domain with padded lat/lon convex hull (PR #45)
@observingClouds approved this pull request.
Alright, I checked now my what I believe is a more performant version of the cropping method<observingClouds@4110e31> as it uses shapely instead of https://github.com/spacetelescope/spherical_geometry and vectorises the "does cull contain point X" question.
I believe the current implementation leads to correct results though. So from that perspective I give a 🟢
—
Reply to this email directly, view it on GitHub<#45 (review)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AY7HJCO7J42LP72KVK6N2ZL3T6XHJAVCNFSM6AAAAABTJS3AC2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTENJRGE2DMMBXGQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Good catch @TomasLandelius. Shapely indeed seem to be able to create 3D polygons but functions like |
shapely does not account for z coordinate
@joeloskarsson I can confirm that both of your points are valid. I encountered quite a slow-down with over 1000 arcs and was able to vectorize the function and drop compute time significantly. Based on the last dev meeting we will do such optimizations in a separate PR once this is merged, right? |
|
Agreed, and thanks for opening #83 issue to track that optimization! If every issue from above was resolved this should be good to go. There is only #45 (comment) that is not resolved, but I think maybe that is already fixed? |
|
After some discussion with GPT-5 I may have came up with a possible solution:
Since the hull is convex, we can implement a vectorized “all dot products ≥ 0” test against the hull’s great‑circle half‑spaces. That avoids Python loops entirely:
- Each hull edge defines a great‑circle plane (normal vector).
- A point is inside iff it lies on the “inside” side of all those planes.
- That reduces to a set of dot‑product checks, which NumPy can do in bulk.
Each hull edge defines a great‑circle plane. The “inside” of the polygon is the intersection of all those half‑spaces. By orienting the normals consistently (using a reference interior point), you guarantee that “inside” corresponds to non‑negative dot products. The final np.all(dots >= 0, axis=1) is fully vectorized: no Python loops, just one matrix multiplication.
See the attached example script for a comparison.
Please verify...
/Tomas
…________________________________________
From: Joel Oskarsson ***@***.***>
Sent: Friday, September 26, 2025 1:38:55 PM
To: mllam/mllam-data-prep
Cc: Tomas Landelius; Mention
Subject: Re: [mllam/mllam-data-prep] Add support for cropping domain with padded lat/lon convex hull (PR #45)
[https://avatars.githubusercontent.com/u/16475490?s=20&v=4]joeloskarsson left a comment (mllam/mllam-data-prep#45)<#45 (comment)>
Agreed, and thanks for opening #83<#83> issue to track that optimization! If every issue from above was resolved this should be good to go. There is only #45 (comment)<#45 (comment)> that is not resolved, but I think maybe that is already fixed?
—
Reply to this email directly, view it on GitHub<#45 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AY7HJCNDJED6TL6V6GGAJCD3UUQU7AVCNFSM6AAAAABTJS3AC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGMZYGI3DIMBVGQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
|
Are we in agreement that this should be merged in as-is @observingClouds and @joeloskarsson? If so, I think all that is left is 1) add a changelog and 2) I will add a README.md in |
|
LGTM from my side. |
|
#45 (comment) still looks unresolved to me, please have a look at that thread. Otherwise I agree, after changelog + a docs readme this should be good to go 😄 |
you're right, sorry! I loose track of things in these comments on github sometimes. So full TODO is:
|
|
Ok, apart from the discussion on #45 (comment) this is ready to go now :) Just need to see if what I've done to resolve that discussion makes sense for you. |
|
#45 (comment) is fixed now yes. I just realized that there might be a problem with the package requirements when testing that (see above). But after sorting that out I agree that this is ready to be merged. |

Describe your changes
The aim of this PR is to add support for cropping one dataset using the convex hull of lat/lon coordinates of grid-points in another dataset.
This is motivated by the need when doing LAM (limited area) based forecasting to be able to define forcing input in a padding region around the limited area. This could for example be when training on DANRA reanalysis one might want to force on the boundary with ERA5 reanalysis. This work builds on @TomasLandelius proof-of-concept implementation (mllam/weather-model-graphs#30) which takes the convex hull of the lat/lon coordinates of grid-points in the limited area (e.g. DANRA) domain, splits the points in the boundary domain by whether they are interior or external to this convex hull (again using the lat/lon coordinates) and defines points in the boundary region as those that are within a given distance (in radians from Earth's center) to the closest point within the convex hull.
So far I have taken the implementation from @TomasLandelius and structured into a notebook (and used DANRA and ERA5 zarr datasets in cloud storage).
Below I have made an example where ERA5 sample points are selected that are within 0.2 radians of the convex hull of grid-points in DANRA:
For the lat/lon convex hull calculations the spherical-geometry package is required. For plotting of course
matplotlibandcartopy. I am thinking of putting the visualiation related packages in an optional[visualisation]extra requirements, but I am not sure if it makes sense to always require the spherical-geometry package.Issue Link
Implements mllam/weather-model-graphs#30
Type of change
Checklist before requesting a review
pullwith--rebaseoption if possible).Checklist for reviewers
Each PR comes with its own improvements and flaws. The reviewer should check the following:
Author checklist after completed review
reflecting type of change (add section where missing):
Checklist for assignee