Skip to content

Regrid out of core

Huite Bootsma requested to merge regrid-out-of-core into master

This implements chunked regridding, where ideally a single chunk is regridded at a time, making the process embarrassingly parallel. Of course, this can only occur when there are no shared cells between the chunks of source and destination (this case should generally be succesfully recognized). The functions automatically decide how many chunks are required based on the overlap required.

The chunking is based on the chunks of the source. The partioning may not always be optimal, but can be easily changed by calling .chunk on the DataArray in question.

I've taken the opportunity to somewhat refactor the code as well, primarily getting rid of checking whether coordinates are increasing or not in the lower level functions. Instead, I simply use xarray to ensure everything is increasing, compute the results, and use xarray again to flip back if necessary. This should also remove some pesty bugs which were caused by the ambiguity of not being able to figure out what the direction of a single cell is (now that everything is increasing, that's quickly answered).

Unfortunately, there's no way around the fact that this remains by far the most complex code in the repository. First there's the complexity due to having a variable number of dimensions, combined with avoiding function call overhead in numba (using the closures to hopefully inline functions). Dealing with coordinates (equidistant or not) combined with chunks (overlapping or not). Without tools like Julia's CartesianIndex and metaprogramming, there doesn't really seem to be an easy way to greatly simplify things: generalizations generally come at the cost of massive memory allocation (a la scipy's vectorized interpolation routines); and string based code generation like https://github.com/EconForge/interpolation.py Fortunately, there are only three spatial dimensions, and I don't think it ever makes much sense to sample in both and time simultaneously. Anyway, it's probably worth it to extract it into a separate RectilinearRegridding package or something at some point.

A final thing that would be nice to have, is an optimized nearest neighbor regridder. It currently falls back to xarray to do nearest neighbor resampling via reindex_like(..., method="nearest"), and I'm fairly confident that one is not as efficient in this case: it is fully general (for e.g. curvilinear grids), so it has to do a lot more work than strictly necessary for rectilinear grids.

Edited by Huite Bootsma

Merge request reports