-
Notifications
You must be signed in to change notification settings - Fork 27
Add Xarray accessor mirroring Raster class
#446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Raster APIRaster class
|
@adehecq @atedstone @erikmannerfelt This PR is also ready for your first review! It is not finalized, but at a good stage to hear your feedback, questions, recommendations, and then move forward to finalize it. Once you are done with this one, you should look at the one for the |
|
This is a giant step forward!! 🎉 I don't have many comments, as I'm not so familiar with the xarray accessor, so I just have to trust you that it works. I'm looking forward to see it explained in the documentation and test it myself, but at the moment it is still abstract 😉 I can also only vaguely imagine the limitations you mention in the description about not accessing the hidden functions. I guess we'll figure out the issues by using and practicing. But the first step is to get it out! |
|
@belletva @remi-braun @guillaumeeb @adehecq @atedstone This PR is getting close to finalized! 😄 Note for clarity: In the following, "Implicit loading" and "Laziness" are not the same thing. As a summary, I have fully reconciled the
The three above aspects are all tested pretty thoroughly in Apart from this, I had to slightly adjust or deprecate some We still need to write up the documentation changes. As functionalities mirror almost exactly the Also, I bumped into something counter-intuitive when implementing some functionalities: Happy for your comments if you have the time! Otherwise could be at later stage (documentation?). |
|
P.S.: I have not implemented an accessor for an |
|
@rhugonnet this looks immense, I am looking forward to trying out this big new functionality properly. Given the decent test suite coverage and my lack of deep technical expertise on accessors, I would be content to see this merged into the main dev branch and then to get a bit more involved once we have the skeleton of documentation available as you suggest. |
|
Thanks @atedstone! @belletva and @adehecq share the same opinion after a meeting yesterday. |
|
I've now finalized the tests that check that Dask out-of-memory reprojection gives 100% the same result as in-memory (only possible since we added the |
|
Amazing! I'll test this accessor asap! |
This PR adds the
rstXarray accessor mirroring theRasterclass.The accessor allows to access all attributes and run all methods already implemented for rasters in GeoUtils from a
xarray.DataArrayobject (e.g.,ds.rst.reproject()), and thereby easily access the rest of the Xarray functionalities throughds(such as plotting) and other low-level behaviour (such as implicit loading, cached loading, chunked loading).It also opens up the opportunity to easily add Dask support to run our functionalities out-of-memory. This requires our functionalities to support a
da.Arrayas input. This is not mandatory (if the functionality only runs with NumPy array, it will simply load theda.Arrayin memory immediately), but is very practical when it is supported.We already did a lot of work to implement the most complex Dask functionalities in #537 (namely
reproject(),subsample()andinterp_points()). Most other functionalities are much easier to support (usingdask.map_blocksor equivalent).Resolves #383
Resolves #567
Resolves #791
Resolves #834
Resolves #825
Resolves #823
Resolves #836
Resolves #840
Resolves #839
Resolves #841
Resolves #842
Facilitated by recent code re-structuring
Recent code re-structuring moved out methods out of the
RasterorVectorclass into separate modules. In some cases, this changed the argument of those (non-public) wrapper methods to accept base inputs (array, transform, crs) instead of the class object itself. Among other things, this facilitates the transition to using our functions with an accessor that has a slightly different object type (classic NumPy array instead of a NumPy masked-array).See #624.
Summary of changes
Most content of the
Rasterclass was moved into a non-publicRasterBaseparent class, containing all attributes and methods shared by theRasterandrstaccessor classes.The
Rasterandrstare subclasses ofRasterBase, and only implement method specific to their object type (such asset_mask()forRasterthat uses NumPy masked-array, or its__array_interface__specific to masked arrays).Remaining in
Rasterare only functionalities specific to theRasterobject itself:__init__,load(),is_loaded,from_array(),copy(),set_mask(), etc,__array_interface__,__add__, etc.Added in the
rstaccessor are only functionalities specific to the accessor:__init__,from_array(),copy().A new
_is_xrboolean attribute identifies if theRasterBase.datais an Xarray object or not.This allows to make choices where necessary, which is only used to return the main attributes stored in the object itself:
data,crs,transformandnodata.All methods returning a raster object (like
reproject()orcrop()) now usefrom_array()that is overridden inRasterandrstto ensure they return the same type as the input: aRasterreturns aRasterinput, and axarray.DataArrayreturns axarray.DataArrayinput.All other attributes and methods return exactly the same non-raster input.
A new method
geoutils.open_rasteris added to open a raster as axr.DataArray(built on top ofrioxarray.open_rasterio()). The difference is that ouropen_rasterforces the data type to befloat32at minimum, and replaces nodata values to NaNs to natively support most NumPy array operations with nodata propagation.This seemed required because Rioxarray does not mask nodata values while preserving the nodata value in its metadata, which is incompatible with the behaviour we need. (To give an clear example: With Rioxarray, either you load the array with -9999 in it and the
ds.rio.nodatais -9999, or you load the array with NaNs in it, and theds.rio.nodataisNaN).I did not find another way to do this here...
New tests
Adding new tests is simple: We simply need to check that all functions give the exact same result for a raster opened as a
Raster, or as axr.DataArray.For this, the new tests introduce a function to check the equality of a
Rasterandxarray.DataArray.Then, they check that all common attributes and methods of
RasterBaserun and return exactly the same output (or equal to the other object type when output is aRaster/xarray.DataArray).Discussion of core differences
The problem with the
rstaccessor object is that, if I'm not mistaken, we won't have access to functionalities that are not explicit such as__array_interface__. Thus, we likely cannot mirror the entire behaviour of theRasterclass (for instance, nooverloading_checkto verify that the georeferencing is the same during an array or arithmetic operation). We can look more into it to be sure, but I don't think it is possible...Thankfully, Xarray generally has similar behaviour as our
Rasterclass, from the implicit loading to array-interfacing. We might want to adjust our functionalities to ensure we mirror that behaviour when possible, so that the code is written the same.The main difference is that Xarray won't natively support nodata in its operations for integer arrays (no masked-array support in Xarray), and thus those need to be converted to NaN-arrays to do so, which increases RAM usage significantly for datasets of integer type. Here again, thankfully, chunked Dask-support can compensate for this, and run any NaN-array size.
So there are pros and cons to using the
Rasteror therstaccessor. We can try to reconcile differences where possible, and for those that are structural to the data objects, we should simply explain them clearly on a documentation page and leave the choice to users! 😄TO-DO
Code
Raster.from_array()toRasterAccessorclass (orRasterBaseclass?) and individual setting operations (fortransform,crs,nodata, andarea_or_point) to make all methods (reproject, etc) naturally work on bothRasterandds.rst,_reproject, etc),delayedfunction (for now just_reprojectuntil Re-organize multiprocessing and chunked modules #830 is done) by detecting automatically if input array is a Dask array.RasterBase,RasterAccessorfunctionalities (comparing tods.rio),Documentation (later PR)
rstaccessor to "The georeferenced raster" page,Raster.reproject()ords.rst.reproject(),rstaccessor.Other Dask support to add (later PRs)
reduce_pointsfunction can copy the same logic asinterp_points,cropfunction usingiselof Rioxarray,rasterize/polygonizefunctions should be straightforward,proximityfunction would be a bit of work...