Choosing one or several Databases for storing raw and learnt data
Current scenario
- We are using json files for storage of the normalized data and the graphs.
- We pull all data from SatNOGs every time we want to test something out.
- We want to add other sources of data as well.
Possible DBMS packages
- I have gone through MongoDB, MariaDB, InfluxDB (as suggested by @saintaardvark and have to go through PostgreSQL as suggested by @acinonyx.
- Advantages of each are as follows:
- MariaDB: Almost like traditional MySQL, so good community support.
- InfluxDB: Very good with time series. Supports SQL like queries.
- MongoDB: Useful pipeline infrastructure which gives a lot of flexibility. Data is stored as JSON documents which allows us to omit certain fields for some rows.
- PostgreSQL: Has the SQL like structure with MongoDB like flexibility. Blazing fast.
- ArangoDB: provies nodes and edges collections with graph functionalities, it is as well a multi-model (nosql) database.
How do we choose?
As pointed out by @saintaardvark in the main chatroom, we need to make a list of the features/properties we expect the DBMS to have before making the choice. Possible features are:
- Handling of empty fields
- Fast storage/retrieval (need to elaborate on this)
- Ability to filter
- Merging data from two or more tables
- Linking two or more tables
(Any other you can think of)
One other constraint is that it should not be a replica of SatNOGs as that would be a waste of resources (duplication).
We will use this issue to finalize the requirements and then proceed to choose the DBMS system to implement.