-
Notifications
You must be signed in to change notification settings - Fork 300
ENH: Changed type casting in regrid. #1478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm not sure I agree with this statement. I can imagine wanting to regrid to very precise locations. I'd be surprised if the dtype of the resulting cube coordinates did not match (or exceed?) my grid_cube. |
Thanks for your comment @esc24. To clarify, you think you should be able to regrid for example some data defined on a 32bit src onto a 64bit grid (with the 64bit grid being your more accurate/precise locations). I disagree with the concept of regridding to "more precise locations" as the dtype of the source grid coordinates determine the level of uncertainty of these 'locations' (ignoring how the data was derived). To regrid to 'more precise locations' is to infer that your original data is defined by a higher precision that it really is. They user should have to explicitly re-cast the result to force the behaviour your describing as this seems wrong to me. |
|
I understand your position @cpelley, and it makes sense. I just happen to disagree with it from a practical perspective. I think the resulting precision (not accuracy or level of uncertainty) of my target grid should be defined by my target grid. |
This makes a lot of sense. If I've asked Iris to re-grid my data onto a particular grid I would expect the result to match that grid exactly. If we change Iris to use the target grid's dtype does that fix your problem?
It's not a simple as that. I can use float32 to record some common grids with perfect precision (e.g. 0 - 360 in steps of 0.5). But the area weighting and my target grid may not be so neat, so I'm perfectly justified in using float64 for all calculations. |
I believe that this change would give the user the ability to control the behaviour in an intuitive way. |
@rhattersley I have to say that I had not considered grids defined with 'perfect precision', good point. It would be great if there could be identifying metadata in the case of the NetCDF fileformat to describe precision, but thats another matter! Thanks @esc24 and @rhattersley, as allways its great to get different points of view to converge to a neater and more appropriate solution :) Basing the dtype on the target grid does sound like the way to go, satisfying all concerned. Happy to implement the changes, though I would love to hear what @jkettleb thinks of the proposed changes first. |
|
One issue to bear in mind in this context is the fact that, because Iris does not currently support the notion of loading grids from 'cubeless' source files (e.g. netcdf files containing lat + long variables, and often bounds variables, but no data variables), we are in the habit of manufacturing our own target grid cubes from such sources. In adopting this tactic one usually wants to use the smallest data type possible (typically np.int8) for the data payload in order to minimise the memory footprint of the cube. The reason being that for high-resolution global grids the lat-long coordinates and bounds together can total several GB of memory. Of course, if there was a way of manufacturing a grid cube without having to reify the data array then this problem would, I think, disappear (if I'm understanding the issue correctly - I may not be :-) Therefore an alternative approach to those mentioned above, and one which I currently use, is to add a The above suggestion relates to the cube's data array. One could adopt a similar tactic for the type of the grid coordinates and bounds. |
To save on memory you can use |
Thanks for this tip @rhattersley, I hadn't thought to use biggus. |
@rockdoc, apologies my comment for this PR may be rather cryptic, the issue is: what behaviour would one expect when dealing with a source and target of different datatypes (and the same with the coordinates). The issue is not directly of storage, as the end result still needs to be a 'real' array. This PR hopes to give control and question: the precision used to calculate this returned result? and what precision is this end result? |
|
I'm going to close this for now, if people think its worth me creating a google groups discussion thread on this then please let me know. |
|
Hi, Just to add from a user perspective that this problem confused the hell out of me. I am regridding a float from one global grid to another, and the resulting cube (and hence netCDF file) had dtype=double. I didn't notice until after the files were used by an official configuration (and so can't be removed), so the output files are using GBs more space than they should. I guess I should check more carefully next time. Personally my expectation would be that the output data had the same dtype as the input data, as described in point one of the original post above. I have worked around this in my own script using ocube.data = ocube.data.astype(np.float32, copy=False)but the memory footprint of the code is larger than it should be. So I'd be grateful if this change could be implemented in Iris. Thanks, |
NOT FOR MERGING
This PR is up at this moment for discussion and is not yet intended for merging (test results have not been updated).
This PR: