Geo: Cloning LFS objects from secondary site downloads from the primary site even when secondary is fully synced
What does this MR do and why?
- Route batch LFS requests to the secondary rails app to proxy requests
- Currently batch requests are immediately sent to the primary
- Use
Geo::LfsObjectRegistry
to determine if the objects in the batch request are synced
How to set up and validate locally
-
Create a project with LFS objects (https://gitlab.com/ibaum/lfstest is available), and wait for it to sync to the secondary
-
Clone the project over ssh to the secondary with
GIT_CURL_VERBOSE=1
set:$ GIT_CURL_VERBOSE=1 git clone ssh://SECONDARY_PROJECT_SSH_URL
Currently, you will see responses in the output with pointers to the primary node
{ "objects": [ { "oid": "OID", "size": 10240000000, "actions": { "download": { "href": "HTTP_URL_TO_PRIMARY_PROJECT/gitlab-lfs/objects/OID", "header": { "Authorization": "Basic ..." } } } }, ... ] }
With the changes from this MR, you will see
{ "objects": [ { "oid": "OID", "size": 10240000000, "actions": { "download": { "href": "HTTP_URL_TO_SECONDARY/gitlab-lfs/objects/OID", "header": { "Authorization": "Basic ..." } } } }, ... ] }
-
Verify clone still routes to primary when secondary is not synced In a rails console on the secondary node to mark the objects as pending
[17] pry(main)> lfs_ids = p.lfs_objects.collect(&:id) [17] pry(main)> Geo::LfsObjectRegistry.find_each.select{|x| lfs_ids.include?(x.id) }.map {|y| y.state = 0 ; y.save! }
-
Running a git clone over ssh with
GIT_CURL_VERBOSE=1
will show the LFS objects being downloaded from the secondary
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #410413 (closed)
Database
Queries
NOTE: EXPLAIN was run against a GDK instance with some test LFS data. The test project has 10000 LFS objects attached, with another 100000 LFS objects on the instance
SELECT "lfs_objects"."id", "lfs_objects"."oid" FROM "lfs_objects" WHERE "lfs_objects"."oid" IN (...)
EXPLAIN for: SELECT "lfs_objects"."id", "lfs_objects"."oid" FROM "lfs_objects" WHERE "lfs_objects"."oid" IN (...) /*application:web,db_config_name:main,line:/sql.rb:8:in `<main>'*/
Index Scan using index_lfs_objects_on_oid on lfs_objects (cost=0.42..9221.69 rows=10011 width=69)
Index Cond: ((oid)::text = ANY ())
(2 rows)
SELECT "lfs_object_registry"."lfs_object_id" FROM "lfs_object_registry" WHERE ("lfs_object_registry"."state" IN (2)) AND "lfs_object_registry"."lfs_object_id" IN ()
Index Scan using index_state_in_lfs_objects on lfs_object_registry (cost=0.43..472.13 rows=1 width=4)
Index Cond: (state = 2)
Filter: (lfs_object_id = ANY ())
(3 rows)