The solution is nominally to lengthen the key, but since it's used to encrypt a wide range of columns, that isn't really feasible in isolation - and there's no way to gracefully rotate db_key_base.
Further details
More generally, we may wish to rotate db_key_base in a range of scenarios, including key compromise or the discovery of hitherto-unknown weaknesses in the existing scheme. Being unable to do so is a bit of a security concern.
We have a key rotation scheme for otp_key_base that works by calculating a new value for every row of the column that otp_key_base uses. This scheme isn't really usable for db_key_base because it's used against a large number of columns
Proposal
Allow old values for db_key_base to be specified in the configuration. When we encrypt a column, always use db_key_base. However, when we come to decrypt a column, use that, and the set of old values, trying each in order until a working key is found.
This allows us to rotate the secret without re-encrypting every column in the database that relies on it. We can then migrate existing data by iterating over every row, re-assigning the values in the encrypted columns, and saving the resulting output.
Additionally, we can provide a rake task to trigger the full reencryption of the database for self-managed users who have a small enough database where doing this over a weekend when most of the employees are offline could make sense.
What does success look like, and how can we measure that?
Able to change the db_key_base without losing access to existing data
Links / references
@jramsay I've marked this one as ~Create but it's one of those that doesn't fit into any devops cycle, really. I think everyone, from Plan to Monitor, has a feature that relies on encrypted database columns. This is coming out of a follow-up to an issue ~Create worked on though.
@nick.thomas may I argue that having multiple keys (the old one and the current one at least, or even multiple generations) makes this whole thing awfully complex and it's hard to know "when you are done".
Especially when you consider a break-glass incident and have to move away from a compromised key with no real migration period.
The otp_key_base rotation does only affect one column, yes. But what is the difference when applying this re-encoding method to multiple columns, one after the other, in a single transaction?
A future improvement could be to use individual keys for each type of data / column so changes of keys would only ever require re-encoding of one column at a time.
@frittentheke right, that's certainly a downside of the proposal above. However, for large-scale instances, it's very difficult to migrate every row in a large table in a short timescale, which is what motivated my suggestion above. Being able to start using the new key immediately, then migrate old rows over in a more-leisurely fashion, is appealing to me.
Using a key per column would reduce the impact of any individual key leak. I'm not strongly against it, but I'm not certain if it's desirable given the number of encrypted columns we have. We also still need a way to rotate any individual key if it is leaked, as well as every key in case the single file they're all stored in gets leaked.
This isn't something attr_encrypted natively supports, so we'd need to add it somehow. As a monkey-patch is quite scary, so we'd probably want to contribute it back upstream.
Lots of different columns use db_key_base as the signing key, and there's a range of encryption types. I think we even have some columns where we encrypt with a fixed IV to allow for searches. Each of these permutations will need to have support for multiple keys adding.
Perhaps @abrandl or @yorickpeterse can provide valuable input on the best approach to take here? How do we best change db_key_base for a large instance like GitLab.com without downtime,and without data loss?
@nick.thomas This would most likely have to be an incremental approach. First you'd have to change the code to support a current and new column, using the new column only if it is not empty. Then, you'd migrate the data into the appropriate columns. Once migrated, you'd have to change the code so it stops using the old column, after which you can drop those columns.
Thank for the suggestion @yorickpeterse, that's not an idea I'd considered. It does solve the problem of interfacing with attr_encrypted - each column would have exactly one key associated with it, as it expects - but the scale of it, and implementating the schema + data migration outside of a release cycle (say we decide to rotate db_key_base next Tuesday, for instance) worries me.
We've successfully done this kind of migration for individual columns, to go from unencrypted to encrypted, as part of a normal release cycle, so it's definitely possible in principle.
@nick.thomas Incremental migrations/upgrades of our data indeed take more time and effort, but they are usually the only approach to making data changes without requiring downtime. In a continuous deployment model this would be less of an issue, but we're not there yet sadly.
@jmeshell I'd say it's mostly for self-managed users - but if the db_key_base is leaked or lost on GitLab.com, we'll need to be able to handle that. Whatever we build should accommodate GitLab.com scale too.
In response to strong customer demand and blocked opportunities, groupdistribution is implementing password encryption for passwords stored in GitLab configuration files. The first iteration involves storing the Rails passwords, starting with the LDAP password, in an encrypted file that is accessed using an encryption key. The encryption key will either be db_key_base, or another key that we add. This will be another dependency on encryption keys without having a way to rotate the keys.
@jeremy I don't know which team is responsible for key rotation. Would it come under Manage?
It's a big hack, yes, but it got the job done. I generated the new key using SecureRandom.hex(64).
To run it on omnibus use gitlab-rails c and just copy/paste when you've configured your keys at the top. Then amend /etc/gitlab/gitlab-secrets.json and run gitlab-ctl reconfigure and it should all just work. I ended up doing the reconfigure first and then running the script, and that was fine too.
Hi @simmerz I am running up against the same issue you did. While reviewing what you did I can't quite determine how to get the OLD_TRUNCATED_BASE_KEY nor how to get NEW_TRUNCATED_BASE_KEY, NEW_BASE_KEY_12, NEW_BASE_KEY_32`. Any guidance?
After checking out this open MR and its changes I came to the conclusion that I need to run the script by @simmerz.
I was wondering about how to define the keys as well. After some code browsing I found this.
Eventually it shows that:
OLD_TRUNCATED_BASE_KEY are the first 32 bytes of your old key (or less if its smaller than 32 bytes)OLD_BASE_KEY_32 is an exact 32 bytes long version of your old key (append as many zeros as needed or truncate)OLD_BASE_KEY_12 is an exact 12 bytes long version of your old key (append as many zeros as needed or truncate)
Similarly for NEW_TRUNCATED_BASE_KEY, NEW_BASE_KEY_12, NEW_BASE_KEY_32.
I updated the script to include tables ci_pipeline_variables and jira_tracker_data which contained additional encrypted data for me (went through the entire schema looking for column names with 'encrypted' in it).
This is what it looks like :
I actually had to first change the db_base_secret in the rails console - only then the script went through (otherwise it failed with cipher exceptions).
Previously when I got it wrong once I got 500 and bad gateway errors all over the place. In the final version everything seems to work as expected.
For those interested we have added some tables to the process (project_import_data, remote_mirrors, ci_pipeline_schedule_variables, operations_feature_flags_client):
Hi @iroussos, would this one fall under your group? We're looking to get it prioritized to unblock a few high priority Infra and internal projects, thanks.
By checking the discussion above, I am not sure about whether we have reached a consensus on how to address this. I can see that a lot of tables would be affected whatever the solution would be (application level solution vs incremental update vs new column and switch), so this seems to me like a large effort that should be getting more attention if it blocks high priority initiatives and projects.
Normally we let the feature groups that own the tables drive such updates as they have the domain expertise, with the support of the Database Group on design decisions or if there is any need on additional tooling. All proposals above seem to be covered by existing tools (helpers) and migration approaches that have been used in the past, but I may be wrong.
@abrandl what is your opinion about this one? It seems to me that adding a new column for each case would be the way to go here, possibly in conjunction with switching to active_record.encryption as Stan proposes in the next comment.
I remember working on the PoC of encryption that would support key rotation some years ago, I shared some PoC code here: #26243 (comment 325457808), but I would need to refresh my memory about how exactly this worked.
I think that we might be able to avoid adding new columns and depend on fallback mechanisms instead @iroussos@awthomas, but in order to have a bit more solid advice about how we could tackle this some quality engineering time needs to be spent on refreshing my PoC / creating a new one
We probably don't need to wait until Rails 7 upgrade, it might be possible to build a custom mechanism, like my two-years-old PoC in #25332 (comment 911336887). The benefit of Rails 7 upgrade would be storing all root keys in a single place, but we can probably still store new keys in rails secrets in a similar location
For people reading this now who don't know by heart which version of rails we're running, the upgrade to Rails 7 has been completed since this comment was written.
Hi @jheimbuck_gl@dhershkovitch - I was digging around and came across #341089 which is currently assigned to ~"group::authentication and authorization" so it's possible that it could be that team. Whichever path we take here, I'm going to link #215299 (closed) to this issue as it is the same evaluation for this work to be done.
Allow old values for db_key_base to be specified in the configuration. When we encrypt a column, always use db_key_base. However, when we come to decrypt a column, use that, and the set of old values, trying each in order until a working key is found.
This allows us to rotate the secret without re-encrypting every column in the database that relies on it. We can then migrate existing data by iterating over every row, re-assigning the values in the encrypted columns, and saving the resulting output.