Skip to content

Dask improvements #208

@rosepearson

Description

@rosepearson

This is an issue for an upcoming NeSI consultancy with @jennan. The focus will be on improving the performance and stability of GeoFabrics for larger scale problems.

Focus on better making use of Dask throughout the GeoFabrics stages. Two identified areas are:

  • RasterArray.interpolate_na for 2D interpolation to remove NaN values
  • RasterArray.clip in series for identifying the lowest value along a waterways

RasterArray.interpolate_na
Profiling has shown that pinch points are quite different for larger scale problems than smaller problems. Take the two profiles below.

  1. is a 6min problem with all geofabric stages. (small_2m_res.html)
    image

  2. is a 4hr problem with all geograbric stages. The only difference with 1 is it is 1m instead of 2m resolution. (small_1m_res.html)
    image

RasterArray.clip
Another area of focus (although it hasn't showed up as an issue in the 1m profiling it is vivible in the 2m) is
image
Could make use of either pandas or dask-geopandas. A weak attempt that I didn't get off the ground can be seen as a comment in processor.py
image

Also worth noting that we may be able to do this more directly using a rolling.min call to the xarray with an appropriate size window.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions