-
Notifications
You must be signed in to change notification settings - Fork 15
Description
This is an issue for an upcoming NeSI consultancy with @jennan. The focus will be on improving the performance and stability of GeoFabrics for larger scale problems.
Focus on better making use of Dask throughout the GeoFabrics stages. Two identified areas are:
- RasterArray.interpolate_na for 2D interpolation to remove NaN values
- RasterArray.clip in series for identifying the lowest value along a waterways
RasterArray.interpolate_na
Profiling has shown that pinch points are quite different for larger scale problems than smaller problems. Take the two profiles below.
-
is a 6min problem with all geofabric stages. (small_2m_res.html)

-
is a 4hr problem with all geograbric stages. The only difference with 1 is it is 1m instead of 2m resolution. (small_1m_res.html)

RasterArray.clip
Another area of focus (although it hasn't showed up as an issue in the 1m profiling it is vivible in the 2m) is

Could make use of either pandas or dask-geopandas. A weak attempt that I didn't get off the ground can be seen as a comment in processor.py

Also worth noting that we may be able to do this more directly using a rolling.min call to the xarray with an appropriate size window.