Skip to content

Add downloads analytics in the virtual registry maven cache entry

🕸️ Context

In the maven virtual registry, one of the core features is to cache frequent requests to upstreams.

In short words, a virtual registry will receive several times the exact same request (for the same file). Instead of always pulling the file from the related upstream (which is external to the GitLab instance), we cache the file (on object storage) and if the cache entry exists, we serve the file from there.

In Maven VReg: utilize CounterAttribute concern to... (#498298 - closed) • David Fernandez • 18.3 • Needs attention, we want to have some basic analytics around cache entries. Mainly:

  • How many times a cache entry is used to serve a file (downloads count).
  • When was the last time that a cache entry was used (downloaded at).

One important thing to have in mind here is that the endpoint that serve files from the virtual registry object (GET requests) has been specifically designed to be as efficient as possible. In that regard, we don't access the primary database within that endpoint, which means that during the handling of the GET requests, we should minimize the writes to the database.

Applying that for our task at hand, it means that, if possible, we don't want to update the analytics of a cache record within the GET request to a virtual registry file.

Thankfully, we have a concern that already handles delayed updates: CounterAttribute. We're not going into the nitty gritty details of that concern but essentially, all the counter updates are stacked on redis. This allows multiple concurrent processes to put counter updates. Then, after a given period (by default 10.minutes), a background job will simply read the counter update from redis and flush them to the database. This way, no matter if we have bursty updates on counters, that burst is mitigated in redis and we write them to the database in a more slowly fashion (max, every 10.minutes). In addition, writing to redis during a GET request will not impact the database primary/replica use. So, this concern nicely fits our use case here.

🚀 Updating the concern

Now, not everything is good. The maven virtual registry cache entry model has some specific aspects that the counter attribute concern doesn't handle:

  • We don't only want to update the downloads_count metric. We also want to bump the downloaded_at column.
    • Thankfully, the concern is baked by this rails function which support updating additional columns.
  • The parent object is a group and not a project. (we have a group_id column and not a project_id column).
  • The primary key is a composite one.
    • The Active Record version that we use already support them.

This MR will need to update the concern to support these additional aspects.

🤔 What does this MR do and why?

  • Update the CounterAttribute concern to:
    • support records that have a group_id instead of a project_id.
    • support records that are powered by an ActiveRecord model that has a composite primary key.
    • support a :touch option in the counter_attribute statement
  • Update the maven cache entry to add a downloads_count and a downloaded_at column.
  • Update the maven virtual registry to bump the downloads counter when the related operation is triggered.
  • Update the related specs.
    • A note on this part. The additional supports we're adding here require a model with specific aspects (for example a composite primary key). Thus, in the specs, we need an instance of that model. As far as we see it, only the maven cache entry has those aspects and use the counter attribute concern. Thus, we need to have a maven cache entry instance in rspecs. However, the maven cache entry is an EE only model. We don't have any other choice than creating EE versions of existing specs to be able to use the cache entry model and assert the behavior of the changes of this MR.

Lastly, the Maven virtual registry is currently behind a beta feature flag. Thus, we don't have a changelog entry here.

📚 References

🖥️ Screenshots or screen recordings

No UI changes.

🧑‍🔬 How to set up and validate locally

Ok, we have some set up to do here. Requirements:

  • Have a GitLab instance with an EE license as the maven virtual registry is an EE only feature.
  • Have a top level group id ready (maintainer access level).
  • Have a PAT ready (scope api).

First, let's enable the feature flag: Feature.enable(:maven_virtual_registry).

Second, let's create a maven virtual registry and point it out to maven central. We can use $ curl for that.

# create the registry object and note the id
$ curl -X POST -H "PRIVATE-TOKEN: <PAT>" "http://gdk.test:8000/api/v4/groups/<top level group id>/-/virtual_registries/packages/maven/registries?name=testing_counters"

# create the upstream and note the id
$ curl -H "PRIVATE-TOKEN: <PAT>" --data-urlencode 'url=https://repo1.maven.org/maven2' --data-urlencode 'name=upstream' -X POST http://gdk.test:8000/api/v4/virtual_registries/packages/maven/registries/<registry id>/upstreams

Last thing, to make our life easier, let's lower the period before the background job kicks in to flush the counter updates from redis to the database:

diff --git a/lib/gitlab/counters/buffered_counter.rb b/lib/gitlab/counters/buffered_counter.rb
index 8f9ad2631530..8d130000085e 100644
--- a/lib/gitlab/counters/buffered_counter.rb
+++ b/lib/gitlab/counters/buffered_counter.rb
@@ -5,8 +5,8 @@ module Counters
     class BufferedCounter
       include Gitlab::ExclusiveLeaseHelpers
 
-      WORKER_DELAY = 10.minutes
-      WORKER_LOCK_TTL = 10.minutes
+      WORKER_DELAY = 10.seconds
+      WORKER_LOCK_TTL = 10.seconds
 
       # Refresh keys are set to expire after a very long time,
       # so that they do not occupy Redis memory indefinitely,

Ok, we're ready.

Let's open a rails console and inspect the upstream cache entries:

VirtualRegistries::Packages::Maven::Upstream.find(<upstream id>).cache_entries
=> []

It's empty obviously, since we didn't make any request to the virtual registry. Let's do one:

$ curl --header "Private-Token: <PAT>" "http://gdk.test:8000/api/v4/virtual_registries/packages/maven/<registry id>/org/eclipse/jetty/jetty-client/12.0.19/jetty-client-12.0.19.pom"

Let's check the cache entries again (you might need to wait for the 10 seconds delay of the background job).

VirtualRegistries::Packages::Maven::Upstream.find(<upstream id>).cache_entries
=> [#<VirtualRegistries::Packages::Maven::Cache::Entry:0x0000000167315bd8
  group_id: 24,
  upstream_id: 236,
  upstream_checked_at: Fri, 04 Jul 2025 13:33:35.488409000 UTC +00:00,
  created_at: Fri, 04 Jul 2025 13:33:35.507984000 UTC +00:00,
  updated_at: Fri, 04 Jul 2025 13:33:50.965496000 UTC +00:00,
  file_store: 1,
  size: 5495,
  status: "default",
  relative_path: "/org/eclipse/jetty/jetty-client/12.0.19/jetty-client-12.0.19.pom",
  file: "jetty-client-12.0.19.pom",
  object_storage_key: "[FILTERED]",
  upstream_etag: "\"32f2a2872c289c105485403aae29cba3\"",
  content_type: "[FILTERED]",
  file_md5: "32f2a2872c289c105485403aae29cba3",
  file_sha1: "dd567c3708a83373db990c18916dbdc42e9e30ab",
  downloads_count: 1,
  downloaded_at: Fri, 04 Jul 2025 13:33:50.965496000 UTC +00:00>]

Notice the downloads_count and downladed_at columns values are set.

Let's pull the same file a few times:

$ repeat 5 curl --header "Private-Token: <PAT>" "http://gdk.test:8000/api/v4/virtual_registries/packages/maven/<registry id>/org/eclipse/jetty/jetty-client/12.0.19/jetty-client-12.0.19.pom"

Let's check the cache entries once final time (you might need to wait for the 10 seconds delay of the background job).

VirtualRegistries::Packages::Maven::Upstream.find(<upstream id>).cache_entries
=> [#<VirtualRegistries::Packages::Maven::Cache::Entry:0x0000000105320158
  group_id: 24,
  upstream_id: 236,
  upstream_checked_at: Fri, 04 Jul 2025 13:33:35.488409000 UTC +00:00,
  created_at: Fri, 04 Jul 2025 13:33:35.507984000 UTC +00:00,
  updated_at: Fri, 04 Jul 2025 13:35:33.116751000 UTC +00:00,
  file_store: 1,
  size: 5495,
  status: "default",
  relative_path: "/org/eclipse/jetty/jetty-client/12.0.19/jetty-client-12.0.19.pom",
  file: "jetty-client-12.0.19.pom",
  object_storage_key: "[FILTERED]",
  upstream_etag: "\"32f2a2872c289c105485403aae29cba3\"",
  content_type: "[FILTERED]",
  file_md5: "32f2a2872c289c105485403aae29cba3",
  file_sha1: "dd567c3708a83373db990c18916dbdc42e9e30ab",
  downloads_count: 6,
  downloaded_at: Fri, 04 Jul 2025 13:35:33.116751000 UTC +00:00>]

The downloads_count has been bumped correctly (1 + 5) and the downloaded_at value has been updated too.

The change is behaving the way we want 🎉

🏎️ MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by David Fernandez

Merge request reports

Loading