Coordinates location and realization to allow string labels afterall
Problem: Integer-only coordinates with attrs['mapping']
are too rigid.
xsnow currently encodes location and realization as integer coordinates, while storing their corresponding names in an attribute mapping such as:
ds.location.attrs['mapping'] = {0: 'VIR1A', 1: 'VIR2A', 2: 'VIR3A'}
This approach was originally chosen for efficiency and NetCDF compatibility. However, it brings the following problem/risk: The mapping in .attrs is non-standard and ignored by xarray operations like merge or concat. When datasets are concatenated or merged, the mapping could become out of sync with the remaining indices. In addition, UX would be much nicer with meaningful labels.
Luckily, integer coords offer little performance or memory advantage (<0.5 MB even for tens of thousands of stations with somewhat short names). Modern NetCDF + compression handles short string arrays efficiently. So, there is no downside in reverting this fundamental design choice albeit the working time to do it.
Next steps
-
Update data model / parser (switch to str coords for location) -
Remove attrs['mapping'] for location -
Adapt affected tests (& update tutorials) -
ensure S1 encoding before writing to nc in to_netcdf()
:
# in to_netcdf():
for name in ("location", "realization"):
if name in ds.coords and ds[name].dtype.kind in ("U","S","O"):
ds[name].encoding = {"dtype": "S1"}