Skip to content

Geo: Skip blob download if already exists

Michael Kozono requested to merge mk/skip-blob-download-if-exists into master

What does this MR do and why?

Describe in detail what your merge request does and why.

Skip blob download if it already exists.

Resolves #352530 (closed)

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

If you don't have a GDK + Geo:

  1. Install GDK + Geo https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/geo.md#easy-installation
  2. It will have seeded many things by default

Confirm that nothing changes when the feature flag is disabled (which is the default).

  1. In your primary GDK/gitlab directory, switch to this branch: git checkout mk/skip-blob-download-if-exists && gdk restart rails
  2. In your secondary GDK/gitlab directory, switch to this branch: git checkout mk/skip-blob-download-if-exists && gdk restart rails
  3. In your secondary GDK/gitlab directory, tail relevant log output: tail -f log/geo.log | grep "Blob download"
  4. Open a new Terminal tab
  5. In your primary GDK/gitlab directory, tail relevant log output: gdk tail gitlab-workhorse | grep "api/v4/geo/retrieve/upload"
  6. Open a new Terminal tab
  7. In your secondary GDK/gitlab directory, delete the registry records of 3 Uploads: gdk psql-geo -c "DELETE FROM file_registry WHERE id IN (SELECT id FROM file_registry WHERE state = 2 AND verification_state = 2 LIMIT 3);".
  8. After a couple minutes (or after you run ::Geo::Secondary::RegistryConsistencyWorker.new.perform && Geo::RegistrySyncWorker.perform_async), then the secondary site will resync these Uploads.
  9. The secondary's Geo log will output something like {"severity":"INFO","time":"2023-12-12T01:28:57.699Z","correlation_id":"1e51c2e3bc90c8cef5aad2cebaafecd9","class":"Geo::BlobDownloadService","gitlab_host":"gdk2.test","message":"Blob download","replicable_name":"upload","model_record_id":4,"mark_as_synced":true,"download_success":true,"bytes_downloaded":65,"primary_missing_file":false,"download_time_s":0.052,"reason":null}.
  10. The primary's Workhorse log will output something like 2023-12-12_01:54:49.90730 gitlab-workhorse : {"content_type":"application/octet-stream","correlation_id":"01HHDVQSCEX05TKR90V7SV58X8","duration_ms":36,"host":"gdk.test:3443","level":"info","method":"GET","msg":"access","proto":"HTTP/1.1","referrer":"","remote_addr":"172.16.123.1:64626","remote_ip":"172.16.123.1","route":"^/api/","status":200,"system":"http","time":"2023-12-11T15:54:49-10:00","ttfb_ms":36,"uri":"/api/v4/geo/retrieve/upload/25","user_agent":"http.rb/5.1.1","written_bytes":65}
  11. Browse to, or create, an issue. Add an attachment in a comment or in the description. Observe the logs.
  12. Browse to Admin Area > Geo > Sites > Replication Details > Uploads. Click Resync on a few of them. Observe the logs.
  13. In particular, notice the absence of "skipped":true in any of the logs. And notice that the primary site's Workhorse receives a request for each Upload.

Now enable the feature and perform the above actions again:

  1. In your primary GDK/gitlab directory, enable the feature flag: Feature.enable(:geo_skip_download_if_exists)
  2. We expect "skipped":true for only the case where we delete registry records. And the primary site's Workhorse does not receive a request when the secondary decides to skip the download.

You can also test that verification still works:

  1. In the secondary site Rails console, Geo::UploadRegistry.first. The file_id happened to be 4.

  2. In the secondary site Rails console, Geo::UploadRegistry.find_by(file_id: 4).replicator.carrierwave_uploader.file.path. This output the path to the upload file => "/Users/mkozonogitlab/Developer/gdk2/gitlab/public/uploads/@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b/6d5eee4fc72fffd83d024115c6866b81/seeded_upload.txt"

  3. Corrupt the file: I opened the file in a text editor and modified it.

  4. In the secondary site Rails console, Geo::UploadRegistry.find_by(file_id: 4).replicator.verify

  5. In the secondary site Rails console, the registry record is now "sync failed" and "verification failed":

    [36] pry(main)> Geo::UploadRegistry.find_by(file_id: 4).reload
      Geo::UploadRegistry Load (0.3ms)  SELECT "file_registry".* FROM "file_registry" WHERE "file_registry"."file_id" = $1 LIMIT $2 /*application:console, db_config_name:geo,console_hostname:MikesGitLabMBP.localdomain,console_username:mkozonogitlab,line:(pry):36:in `__pry__'*/  [["file_id", 4], ["LIMIT", 1]]
      Geo::UploadRegistry Load (0.1ms)  SELECT "file_registry".* FROM "file_registry" WHERE "file_registry"."id" = $1 LIMIT $2 /*application:console, db_config_name:geo,console_hostname:MikesGitLabMBP.localdomain,console_username:mkozonogitlab,line:(pry):36:in `__pry__'*/  [["id", 67], ["LIMIT", 1]]
    => #<Geo::UploadRegistry:0x00000001437bf2c0
    id: 67,
    file_id: 4,
    created_at: Tue, 12 Dec 2023 00:53:31.817629000 UTC +00:00,
    retry_count: 1,
    retry_at: Tue, 12 Dec 2023 01:28:36.114777000 UTC +00:00,
    missing_on_primary: false,
    state: 3,
    last_synced_at: Tue, 12 Dec 2023 00:54:07.591014000 UTC +00:00,
    last_sync_failure: "Verification failed with: Checksum does not match the primary checksum  {:checksum=>\"dcc13385700f84ab63961a0c88d20e1ff79e97493945f0673f5653f45ac93bcc\",  :primary_checksum=>\"85418cc881d37d83c7e681bc43f63731bf0849e06dc59fa8fa2dcf5448a47b8e\"}",
    verified_at: Tue, 12 Dec 2023 01:27:50.114645000 UTC +00:00,
    verification_started_at: Tue, 12 Dec 2023 01:27:50.101930000 UTC +00:00,
    verification_retry_at: Tue, 12 Dec 2023 01:28:14.114560000 UTC +00:00,
    verification_state: 3,
    verification_retry_count: 1,
    verification_checksum: "dcc13385700f84ab63961a0c88d20e1ff79e97493945f0673f5653f45ac93bcc",
    verification_checksum_mismatched: "dcc13385700f84ab63961a0c88d20e1ff79e97493945f0673f5653f45ac93bcc",
    checksum_mismatch: true,
    verification_failure: "Checksum does not match the primary checksum {:checksum=>\"dcc13385700f84ab63961a0c88d20e1ff79e97493945f0673f5653f45ac93bcc\",  :primary_checksum=>\"85418cc881d37d83c7e681bc43f63731bf0849e06dc59fa8fa2dcf5448a47b8e\"}">
  6. In the secondary site Rails console, Geo::RegistrySyncWorker.perform_async to get the system to resync it

  7. Open the file in a text editor again-- it is fixed!

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Michael Kozono

Merge request reports