Geo: Use binary type for file registry sha256 values
In https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/4719#note_65762475, @yorickpeterse
asks:
Would it be possible to extend our binary SHA support to also support SHA256? That way we probably need less space compared to storing a SHA256 as a String.
It's easy enough to convert the values on the database side of things via decode
, the code would have to be updated across all the tables to use this.
I tried this:
diff --git a/ee/db/geo/migrate/20180322062741_migrate_ci_job_artifacts_to_separate_registry.rb b/ee/db/geo/migrate/20180322062741_migrate_ci_job_artifacts_to_separate_registry.rb
index 1b16b1c899d..a9ede8f3c66 100644
--- a/ee/db/geo/migrate/20180322062741_migrate_ci_job_artifacts_to_separate_registry.rb
+++ b/ee/db/geo/migrate/20180322062741_migrate_ci_job_artifacts_to_separate_registry.rb
@@ -7,7 +7,7 @@ class MigrateCiJobArtifactsToSeparateRegistry < ActiveRecord::Migration
t.integer "artifact_id", unique: true
t.integer "retry_count"
t.boolean "success"
- t.string "sha256"
+ t.binary "sha256"
end
Geo::TrackingBase.transaction do
@@ -15,7 +15,7 @@ class MigrateCiJobArtifactsToSeparateRegistry < ActiveRecord::Migration
execute <<~EOF
INSERT INTO job_artifact_registry (created_at, retry_at, artifact_id, bytes, retry_count, success, sha256)
- SELECT created_at, retry_at, file_id, bytes, retry_count, success, sha256
+ SELECT created_at, retry_at, file_id, bytes, retry_count, success, decode(sha256, 'hex')
FROM file_registry WHERE file_type = 'job_artifact'
EOF
@@ -25,10 +25,10 @@ class MigrateCiJobArtifactsToSeparateRegistry < ActiveRecord::Migration
$BODY$
BEGIN
IF (TG_OP = 'UPDATE') THEN
- UPDATE job_artifact_registry SET (retry_at, bytes, retry_count, success, sha256) = (NEW.retry_at, NEW.bytes, NEW.retry_count, NEW.success, NEW.sha256);
+ UPDATE job_artifact_registry SET (retry_at, bytes, retry_count, success, sha256) = (NEW.retry_at, NEW.bytes, NEW.retry_count, NEW.success, decode(NEW.sha256, 'hex'));
ELSEIF (TG_OP = 'INSERT') THEN
INSERT INTO job_artifact_registry (created_at, retry_at, artifact_id, bytes, retry_count, success, sha256)
- VALUES (NEW.created_at, NEW.retry_at, NEW.file_id, NEW.bytes, NEW.retry_count, NEW.success, NEW.sha256);
+ VALUES (NEW.created_at, NEW.retry_at, NEW.file_id, NEW.bytes, NEW.retry_count, NEW.success, decode(NEW.sha256, 'hex'));
END IF;
RETURN NEW;
END;
But then my specs failed because our string comparisons aren't right. For example:
(byebug) job_artifact_registry.first.sha256
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
(byebug) '0' * 64
"0000000000000000000000000000000000000000000000000000000000000000"
As far as I can tell, most of our code isn't using the Geo-specific SHA256 yet, but I think it makes sense to switch all file_registry objects to this type when it's ready rather than having a mixed implementation.