Evaluate solution to obfuscate/mask columns in the Gitlab.com data model
DRI: @alexander-sosna
Backup: @rhenchen.gitlab
What
We'd like to evaluate solutions to obfuscate/mask data at column level in PostgreSQL, in order that SDEs and SREs (through Teleport) and T&S users (through Omamori) would have their read access blocked to columns labeled as red data within our database model (as discussed at https://gitlab.com/gitlab-com/gl-security/security-change-management/-/issues/9#note_1495482542).
In this issue we would like to discuss and iterate to evaluate any extensions that provide ability to obfuscate/mask data at column level.
The ideal solution should be:
- as transparent to the application as possible (smallest or no necessity to change the application code)
- with no performance impact
- small management burden
The initial extensions I found through quick research in the scope of the Omamori project where:
-
The
anon
extension, or postgresql anonymiser; an active project by Dalibo hosted in Gitlab, it seams reasonable simple to install, and dynamic masking for specific roles seems easy to implement with minimum to none application impact; -
The
sepgsql
extension, which is a native extension, but apparently require our application to be able to handle the access of masked data; -
The
pg_datamask
extension, there's not many documentation about it as it seems to be a proprietary solution made by Cybertec; -
https://habr.com/en/companies/yandex/articles/485096/ (recommended by Kras during the DB group weekly meeting; we need to check if apply to our problem)
-
https://github.com/smithoss/gonymizer (recommended by Kras during the DB group weekly meeting; we need to check if apply to our problem)
Please feel free to propose any other extensions that were not yet listed;
Assessment
We need to assess the following items for any proposed solution/extension:
- Transparent to the Application (smallest or no necessity to change application code);
- Performance impact/overhead;
- Management/administration burden;
- Complexity to install and implement;
Why
Currently there are security concerns, raised by our Security Architect @plafoucriere, regarding internal Gitlab users having read access to customer <code data-sourcepos="40:153-40:160">red data</code> in the scope of the Omamori project. These same concerns extend to any Teleport users.
Therefore we need to ensure internal least access privilege of any columns within our database model that are considered red data following our SAFE Framework.
T&S team and SDEs require access to our production data to agile development, quickly troubleshoot issues, and quickly mitigate future/emerging threats. However, in general they don't require access to red data
information to perform their daily duties.
Obfuscating/masking data is an alternative to restricting/blocking access to the columns labeled as red data
. As discussed at the Omamori project implementing column level restrictions would require Gitlab to manage GRANTs with a similar administration burden as of managing access through views, which will require to implement specific data migration processes to control the access into our schemas, but which will impose a bureaucracy level that might block T&S team and SDEs on their daily duties.