Proposal for TangoDatabase: Refactor Database.cpp, add a DB abstraction layer, add support for PostgreSQL and SQLite
Problem
The TangoDB currently uses direct MariaDB/MySQL queries to interact with its database. This leads to several key issues:
- Directly using SQL-specific code strongly ties TangoDB to a specific database technology, but:
- sqlite is much more efficient for small instances (f.e. tests),
- PostgreSQL allows more functionality, and more performance for larger instances/higher loads,
- Manually constructed queries directly in code
- mixes concerns (I/O with database and program logic), which makes tests harder and bugs likelier,
- duplicates a lot of code, leading to higher maintenance costs and lower robustness,
- leaves the queries vulnerable to SQL injections and
- makes the queries harder to change, and thus the code harder to update.
These problems are jointly addressed by
- separating the database-specific code from the program logic and
- add support for other database implementation(s).
The authors are of the opinion that most of this work can be outsourced following the approach described below.
Approach
Approach is as follows:
- Create an abstraction layer in Database.cpp that isolates the SQL calls.
- This allows to develop on top of the abstraction layer a set of implementations which replace then the generic SQL calls with specialised SQLite/PostgreSQL calls later on.
- Load the .sql files in PostgreSQL and
- port the stored procedures using the queries which are used in Database.cpp.
- Implement a switch in Database.cpp (factory could be a design pattern) and Shell scripts to select MariaDB or PostgreSQL implementations. SQLite support is optional but should later be possible. An OS independent solution to select a DB could be an environment variable.
Considering:
- Supporting PostgreSQL will be easier than supporting sqlite, as the latter will also require added functionality to cover features used in the abstraction layer which SQLite does not support out of the box. This additional functionality must be part of a specialised implementation which gets called by the abstraction layer.
Abstraction layer in Database.cpp
- Create an ordered list of functions & generalise: Group the list of functions per subject (f.e. device_attribute). The behaviour will be highly similar or equal between the groups (apart from the “remaining functions” group). This allows us to focus on one group and translate any result to all others.
- Write tests: Write tests for each group to cover key use cases. Copy/paste them to the groups that are similar to the ones with tests. This provides basic guarantees about behaviour, which is critical when refactoring code.
- Separate: Abstract MariaDB-specific code into separate functions, and later into a separate file (see above). This will allow refactoring to generalise this code. Some of this refactoring might come naturally already in this step.
- Refactor the MariaDB functions to generalise and shorten them. This reduces duplication, as well as work later on.
- Introduce an abstraction layer that formally allows the MariaDB code to be swapped by another implementation (future PostgreSQL/SQLite code).
Porting SQL files to PostgreSQL
In parallel, we can port the SQL code to PostgreSQL. We try to get the .sql files working in PostgreSQL, and throw the queries at it which Database.cpp would throw at it:
- Setup a PostgreSQL instance and interactive console/GUI.
- Load the .sql files and fix any syntax problems.
- Throw the queries from Database.cpp at it and fix any problems.
This work will mainly involve getting the stored procedures to work. If the abstraction layer and tests would be in place, we could also use the TangoDB to verify the stored procedures. But the above interactive approach is needed either way.
Add PostgreSQL support to TangoDB
The above work puts us in a good position for adding PostgreSQL support in addition to MariaDB/MySQL support:
- Add a PostgreSQL implementation to mirror the MariaDB implementation, but using a PostgreSQL client library. Use what is learned when porting the SQL files to PostgreSQL.
- Add a switch for the .sh/.sql files to use the relevant ones when initialising the database.
Optional work afterwards
There is likely to be a lot of common code (queries, setting up the tables, etc), so there is an opportunity here to use a common base for the SQL implementations. Consider using an ORM, i.e. a library that abstracts on top of SQL implementations (MariaDB/PostgreSQL), to allow most of the basic queries to be based on the same code in Database.cpp
. Nevertheless, we cannot abstract on such an ORM directly, as it will not implement missing functionality for us (f.e. in SQLite, or later on, maybe in MariaDB if we leverage advanced PostgreSQL functionality).
Proposal authored by: Jan David Mol (ASTRON) Thomas Braun (byte physics) Thomas Juerges (SKAO)