Unexpected data types in time series
Bug Report
🐛 Summary
In previous version of shepard (< 4.0), InfluxDB was used as storage engine for time series data. Using InfluxDB, four different types of data were supported (by the database):
- Strings
- Boolean
- Integers
- Floating point numbers
The first value stored in a time series in InfluxDB defines the data type of the time series and cannot be changed later.
As the communication between an application and the shepard backend is done using a REST API, all data needs to be serialized into a JSON string that can be transmitted.
Upon reception, the shepard backend tries to determine the data type and checks, whether the type matches any already existing data in the time series, if there is any.
For String and Boolean, this worked as intended.
For Integer and floating point however, this detection was faulty. If a time series did not already exist, the type was always decided to be floating point if the first value of the transmitted data was numeric, even if it was indeed integer.
Therefore, all time series with numeric values created with shepard < 4.0.0 have only floating point values. No integer time series exist.
Furthermore, the shepard backend silently converted numeric types into each other, when necessary, therefore no error occurred when placing integer numbers into a floating point time series. Floating point values would also have been silently converted to integer, if an integer time series ever existed.
Since shepard 4.0.0 and the migration to TimescaleDB, no implicit conversion of data types is done any more. If a client now tries to store integer values into a time series that has been migrated from shepard < 4.0.0 (and thus has type floating point), an exception is returned to the client due to mismatching types. That happens, even if the client has never reported a single floating point value.
💣 What is the current bug behavior?
At the moment, all clients that transmit new time series data containing integer values for a time series that has been migrated from InfluxDB will fail, and no data is being stored.
🥅 What is the expected correct behavior?
The type of already existing time series cannot be changed later. Therefore, it is not possible to convert the already existing data from float to integer (if there was any loss of precision due to the wrong data type, this has already occurred).
A new TimeseriesContainer can be created, where the time series are newly created with the right data type. Old data however will remain in the old TimeseriesContainer.
We plan to introduce a feature that allows the backend for implicit conversion from integer to float in order to maintain API compatibility with previous versions. This is expected for version 5.1.0.
🤔 Risks & Open Questions
Should there be any way of changing the data type of existing data without creating a new container?