Maven Virtual Registry: optimize the upload confirmation endpoint
🔥 Problem
When a file is uploaded through the Maven virtual registry feature, a workhorse assisted upload occurs. We're not going to detail that logic but basically workhorse receives the object storage key to store the file on object storage and then call rails to "confirm" the upload.
The "confirm" upload endpoint is where rails will actually create the appropriate records in the database, persisting the fact that a file lives under that object storage key.
The Maven virtual registry feature runs under harsh performance constraints as it is basically a proxy that sits between a package manager client and a package registry. Thus, we've been on the lookout of of any improvement that we can do during the execution time.
While monitoring the upload confirmation endpoint, we observed the following.
A PutObject request (expected) and a HeadObject request are made to object storage:
That request is not necessary. Our assumption here is that the object storage library (carrierweave) gets a function where it requires to pull metadata information from the object storage key and that's where the HEAD request is made. However, while confirming the upload, workhorse is already passing all the metadata information (such as the file size or what kind of upload it is) that is necessary. As such, this call is not used at all. Object storage should receive a single call: the upload request that comes from workhorse and that's it.
Looking at the SQL requests now, we can see:
TRANSACTION (0.1ms) BEGIN /*application:web,correlation_id:01JA849XP38Z9C9E2JGHX6X57D,endpoint_id:POST /api/:version/virtual_registries/packages/maven/:idpath/upload,db_config_database:gitlabhq_development,db_config_name:main,line:/app/models/virtual_registries/packages/maven/cached_response.rb:68:in `block in create_or_update_by!'*/
↳ app/models/virtual_registries/packages/maven/cached_response.rb:68:in `block in create_or_update_by!'
VirtualRegistries::Packages::Maven::CachedResponse Exists? (0.2ms) SELECT 1 AS one FROM "virtual_registries_packages_maven_cached_responses" WHERE "virtual_registries_packages_maven_cached_responses"."relative_path" = '/org/springframework/spring-web/6.1.12/spring-web-6.1.12.pom' AND "virtual_registries_packages_maven_cached_responses"."upstream_id" = 26 AND "virtual_registries_packages_maven_cached_responses"."status" = 0 LIMIT 1 /*application:web,correlation_id:01JA849XP38Z9C9E2JGHX6X57D,endpoint_id:POST /api/:version/virtual_registries/packages/maven/:idpath/upload,db_config_database:gitlabhq_development,db_config_name:main,line:/app/models/virtual_registries/packages/maven/cached_response.rb:68:in `block in create_or_update_by!'*/
↳ app/models/virtual_registries/packages/maven/cached_response.rb:68:in `block in create_or_update_by!'
VirtualRegistries::Packages::Maven::CachedResponse Create (0.3ms) INSERT INTO "virtual_registries_packages_maven_cached_responses" ("group_id", "upstream_id", "upstream_checked_at", "downloaded_at", "created_at", "updated_at", "size", "relative_path", "file", "object_storage_key", "upstream_etag", "content_type", "file_final_path") VALUES (22, 26, '2024-10-15 13:01:32.184432', '2024-10-15 13:01:32.184432', '2024-10-15 13:01:32.200679', '2024-10-15 13:01:32.200679', 2398, '/org/springframework/spring-web/6.1.12/spring-web-6.1.12.pom', 'spring-web-6.1.12.pom', '5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca/virtual_registries/packages/maven/26/upstream/26/cached_response/adfca8e75999d3ad0f1fcdff0de3960320c3600e349b3617f296bb0444d3800c', '"54ce07f4124259b2ea58548e9d620004"', 'text/xml', '5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca/@final/3d/99/23e08863197c92435f70a29a150c438a9f72b6b2b66296385b808e0a1c3d') RETURNING "id" /*application:web,correlation_id:01JA849XP38Z9C9E2JGHX6X57D,endpoint_id:POST /api/:version/virtual_registries/packages/maven/:idpath/upload,db_config_database:gitlabhq_development,db_config_name:main,line:/app/models/virtual_registries/packages/maven/cached_response.rb:68:in `block in create_or_update_by!'*/
↳ app/models/virtual_registries/packages/maven/cached_response.rb:68:in `block in create_or_update_by!'
VirtualRegistries::Packages::Maven::CachedResponse Update (0.3ms) UPDATE "virtual_registries_packages_maven_cached_responses" SET "file_store" = 2 WHERE "virtual_registries_packages_maven_cached_responses"."id" = 10941 /*application:web,correlation_id:01JA849XP38Z9C9E2JGHX6X57D,endpoint_id:POST /api/:version/virtual_registries/packages/maven/:idpath/upload,db_config_database:gitlabhq_development,db_config_name:main,line:/app/models/concerns/file_store_mounter.rb:17:in `block in mount_file_store_uploader'*/
↳ app/models/concerns/file_store_mounter.rb:17:in `block in mount_file_store_uploader'
TRANSACTION (0.1ms) COMMIT /*application:web,correlation_id:01JA849XP38Z9C9E2JGHX6X57D,endpoint_id:POST /api/:version/virtual_registries/packages/maven/:idpath/upload,db_config_database:gitlabhq_development,db_config_name:main,line:/lib/gitlab/database.rb:417:in `commit'*/
So we have a transaction that contains an INSERT statement (expected), followed by an UPDATE statement. This is not great, as a general performance guideline, any given database table should be accessed once per request. Here, it's even worse, we create a record that we immediately update. That is not correct. We should have a single INSERT statement with all the required fields.
Moreover, the UPDATE statement is handling the file_store column. Here are the possible values. This is not great for two reasons:
- workhorse is already sending this information (if the file is local or remote on object storage). See this. Thus, we already know this at
INSERTtime. - somehow, we're loading the uploaded file metadata to get an information that is not on object storage. In other words, we load the object storage file metadata (that's what we assume is the
HEADrequest) but we actually don't do anything with it.
🚒 Solution
This will need a small investigation but it looks like we're loading needlessly the object storage file metadata to set the file_store column.
- Find a way to avoid that and implement for the Maven virtual registry.
- If possible, find a way to avoid this for all workhorse assisted uploads.
