Overview of regridding/resampling
@visr @DirkEilander This is a short overview of resampling methods, related to projected and unprojected geospatial grids and meshes.
GDAL (and GDAL via rasterio) are probably the most commonly known. Supports a variety of methods, and is suitable for reprojecting from one coordinate system to another as well. Limitations are primarily:
- Exclusively 2D, no convenience for layers, time. Reprojecting becomes rather costly, since it recomputes weights for every slice.
- Resampling methods are purely based on cell centers, no true evaluation of areal overlap
- Alll C++ in a massive binary, so there’s no easy way to add different resampling methods.
Because I wanted a number of different methods for resampling (e.g. harmonic mean, conservative methods for conductance), I started looking around some time ago. First usable thing I stumbled upon is this library: https://github.com/esa-esdl/gridtools
I added some methods, but it doesn’t take long to realize that almost many methods are a reduction over a collection of weights and values. This can be abstracted, which is what I did for the imod Regridder: https://imod.xyz/api/prepare.html#imod.prepare.Regridder
As well as adding 1D and 3D methods. Since Numba supports JIT, it’s easy to add custom functions (provided they are a reduction over weights and values, linear interpolation doesn’t work this way, and requires a different implementation).
In terms of other numba based interpolation/resampling: https://github.com/EconForge/interpolation.py
(relies on string based code generation -> JIT, which is quirky, but no macros in Python...)
Just found this also while googling: https://github.com/dbstein/fast_interp
Pyresample is 2D, supports a number of methods, mostly focused on satellite imagery, so I think it can do reprojection: https://pyresample.readthedocs.io/en/latest/
Xemsf uses the python bindings to EMSF to do resampling (via regridding weighst netcdf): https://xesmf.readthedocs.io/en/latest/why.html Currently, it only supports rectilinear and curvilinear grids (since that’s what xarray supports). EMSF can do a lot more, but is a very complicated fortran binary (1 million LOC apparently), which isn’t available on Windows.
A bunch of other utilities:
- https://github.com/pysal/tobler: Tobler is a python package for areal interpolation, dasymetric mapping, and change of support.
- https://github.com/corteva/geocube: Tool to convert geopandas vector data into rasterized xarray data.
- https://github.com/corteva/rioxarray: “You can learn how to clip, merge, and reproject rasters in the Usage Examples section of the documentation. Need to export to a raster (GeoTiff)? There is an example for that as well.”
- https://github.com/makepath/xarray-spatial: Xarray-Spatial implements common raster analysis functions using Numba and provides an easy-to-install, easy-to-extend codebase for raster analysis.
- https://github.com/CNES/pangeo-pyinterp: Python library for optimized geo-referenced interpolation.
I’ve made a start with what a Regridder for unstructured 2D meshes looks like: https://github.com/Huite/xugrid/blob/master/xugrid/regrid.py
In contrast to e.g. rectilinear regridding, computing weights is a lot more expensive, so taking the EMSF route and storing the weights seems like a good idea.
I think I’d like to refactor the regridding stuff into something separate, easy-installable, which deals with both structured and unstructured data (since one often wants to convert one to the other). For performance, you’d be able to specialize on structured input; for unstructured meshes, you need some kind of spatial indexing scheme. See: https://github.com/ESM-VFC/xoak/issues/12 (I left a comment there as well).
In terms of unstructured data (2D unstructured, layer for z/depth), the UGRID convention is good and broadly applicable (it supports data on faces, edges, nodes). All mesh definitions I've encountered are very similar: coordinate arrays for the nodes, and index arrays mapping to those nodes.
The closest thing to an "in-memory" UGRID netCDF (like xarray is an "in-memory" netCDF) is the Gridded Dataset: https://github.com/NOAA-ORR-ERD/gridded
I'd like to integrate xarray with it (haven't started yet), but there's also some work coming where xarray will support more flexible indexing for unstructured meshes (see the discussion on the gridded issues).