Skip to content

Update dependency proxy API to use cleanup worker

Steve Abrams requested to merge 348176-dp-delete-api-update into master

🌵 Context

In !70029 (merged), we added cleanup policies to the Dependency Proxy to allow users to configure a regular deletion of the files in their cache.

There is also an API endpoint to fully purge the Dependency Proxy cache.

Currently, the API endpoint kicks off a background job that simply destroys all of the various files (dependency_proxy_blobs and dependency_proxy_manifests). Since the cleanup policies have added background job that regularly deletes expired blobs and manifests, we can utilize this job to make this API endpoint more efficient by updating the API endpoint to expire the records, which is a simple database update.

In addition to improving ~performance, this also addresses a bug 🐛 as noted in https://gitlab.com/gitlab-org/gitlab/-/issues/348168 (private link).

🔎 What does this MR do and why?

  • Update the Dependency Proxy purge cache worker to utilize the newer more optimized file deletion.
  • Remove the lease restrictions from the API since the deletion from the Database standpoint will be much faster.

I considered removing the PurgeDependencyProxyCacheWorker altogether and moving the UPDATE queries directly to the API, however if there is a group with a large number of blob/manifest records to be updated, the time to complete the update may take a few seconds.

For example, if we have a group with 5000 blob records, we update them in batches of 100. If each update takes 100ms (see database analysis below), the overall update will take 5 seconds to complete.

🐘 Database

Note, the examples below all use dependency_proxy_blobs. The dependency_proxy_manifests table has the same structure as dependency_proxy_blobs and will perform similarly. A group will always have more dependency_proxy_blobs than dependency_proxy_manifests, so we can expect the blobs table to have the slower performance of the two.

Queries generated by @group.dependency_proxy_blobs.each_batch(of: UPDATE_BATCH_SIZE)

SELECT id FROM dependency_proxy_blobs WHERE group_id = 9970 ORDER BY id ASC LIMIT 1;

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/7645/commands/27154

SELECT id FROM dependency_proxy_blobs WHERE group_id = 9970 AND id >= 4461 ORDER BY id ASC LIMIT 1 OFFSET 100;

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/7645/commands/27157

Query generated by batch.update_all(status: :expired)

UPDATE "dependency_proxy_blobs"
SET    "status" = 1
WHERE  "dependency_proxy_blobs"."group_id" = 9970
       AND "dependency_proxy_blobs"."id" >= 4461
       AND "dependency_proxy_blobs"."id" < 8831;

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/7645/commands/27156

📸 Screenshots or screen recordings

Before:

[16] pry(main)> Group.find(181).dependency_proxy_manifests.map(&:status)
=> ["default", "default"]
[27] pry(main)> Group.find(181).dependency_proxy_blobs.map(&:status)
=> ["default", "default", "default"]

API request:

→ curl --request DELETE -H "PRIVATE-TOKEN: <token>" "http://gdk.test:3001/api/v4/groups/181/dependency_proxy/cache"
202

After:

[19] pry(main)> Group.find(181).dependency_proxy_manifests.map(&:status)
=> ["expired", "expired"]
[20] pry(main)> Group.find(181).dependency_proxy_blobs.map(&:status)
=> ["expired", "expired", "expired"]

How to set up and validate locally

  1. Create a group
  2. Log into the dependency proxy (you can use your username/password for credentials, or username/personal_access_token).
    docker login gdk.test:3001
  3. Use the Dependency Proxy to pull a number of images through the group:
    docker pull gdk.test:3001/<group_full_path>/dependency_proxy/containers/nginx:latest
    docker pull gdk.test:3001/<group_full_path>/dependency_proxy/containers/node:latest
  4. Navigate to the group dependency proxy page to view the pulled images group -> Packages & Registries -> Dependency Proxy. You can also view the records in the rails console:
    Group.last.dependency_proxy_manifests
    Group.last.dependency_proxy_blobs
  5. Make a request to the purge API:
    curl --request DELETE -H "Private-Token: <personal_access_token>" "http://gdk.test:3001/api/v4/groups/<group_id>/dependency_proxy/cache"
  6. Check in the Dependency Proxy UI to make sure they are no longer visible. You can also check that all of the records are now expired in the rails console:
    Group.last.dependency_proxy_blobs.map(&:status)
    Group.last.dependency_proxy_manifests.map(&:status)
  7. (optional) If you'd like to check that they get deleted, you can run the background jobs that delete the blobs and manifests:
    DependencyProxy::CleanupBlobWorker.perform_at(1.second)
    DependencyProxy::CleanupManifestWorker.perform_at(1.second)
    After the jobs run, all of the records should be deleted:
    Group.last.dependency_proxy_blobs.size
    Group.last.dependency_proxy_manifests.size

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #348176

Edited by Steve Abrams

Merge request reports