Add downloads analytics in the virtual registry maven cache entry
🕸️ Context
In the maven virtual registry, one of the core features is to cache frequent requests to upstreams.
In short words, a virtual registry will receive several times the exact same request (for the same file). Instead of always pulling the file from the related upstream (which is external to the GitLab instance), we cache the file (on object storage) and if the cache entry exists, we serve the file from there.
In Maven VReg: utilize CounterAttribute concern to... (#498298 - closed) • David Fernandez • 18.3 • Needs attention, we want to have some basic analytics around cache entries. Mainly:
- How many times a cache entry is used to serve a file (downloads count).
- When was the last time that a cache entry was used (downloaded at).
One important thing to have in mind here is that the endpoint that serve files from the virtual registry object (GET
requests) has been specifically designed to be as efficient as possible. In that regard, we don't access the primary database within that endpoint, which means that during the handling of the GET
requests, we should minimize the writes to the database.
Applying that for our task at hand, it means that, if possible, we don't want to update the analytics of a cache record within the GET
request to a virtual registry file.
Thankfully, we have a concern that already handles delayed updates: CounterAttribute
. We're not going into the nitty gritty details of that concern but essentially, all the counter updates are stacked on redis. This allows multiple concurrent processes to put counter updates. Then, after a given period (by default 10.minutes
), a background job will simply read the counter update from redis and flush them to the database. This way, no matter if we have bursty updates on counters, that burst is mitigated in redis and we write them to the database in a more slowly fashion (max, every 10.minutes
). In addition, writing to redis during a GET
request will not impact the database primary/replica use. So, this concern nicely fits our use case here.
🚀 Updating the concern
Now, not everything is good. The maven virtual registry cache entry model has some specific aspects that the counter attribute concern doesn't handle:
- We don't only want to update the downloads_count metric. We also want to bump the downloaded_at column.
- Thankfully, the concern is baked by this rails function which support updating additional columns.
- The parent object is a
group
and not aproject
. (we have agroup_id
column and not aproject_id
column). - The primary key is a composite one.
- The Active Record version that we use already support them.
This MR will need to update the concern to support these additional aspects.
🤔 What does this MR do and why?
- Update the
CounterAttribute
concern to:- support records that have a
group_id
instead of aproject_id
. - support records that are powered by an ActiveRecord model that has a composite primary key.
- support a
:touch
option in thecounter_attribute
statement
- support records that have a
- Update the maven cache entry to add a
downloads_count
and adownloaded_at
column. - Update the maven virtual registry to bump the downloads counter when the related operation is triggered.
- Update the related specs.
- A note on this part. The additional supports we're adding here require a model with specific aspects (for example a composite primary key). Thus, in the specs, we need an instance of that model. As far as we see it, only the maven cache entry has those aspects and use the counter attribute concern. Thus, we need to have a maven cache entry instance in rspecs. However, the maven cache entry is an EE only model. We don't have any other choice than creating EE versions of existing specs to be able to use the cache entry model and assert the behavior of the changes of this MR.
Lastly, the Maven virtual registry is currently behind a beta
feature flag. Thus, we don't have a changelog entry here.
📚 References
🖥️ Screenshots or screen recordings
No UI changes.
🧑🔬 How to set up and validate locally
Ok, we have some set up to do here. Requirements:
- Have a GitLab instance with an EE license as the maven virtual registry is an EE only feature.
- Have a top level group id ready (
maintainer
access level). - Have a PAT ready (scope
api
).
First, let's enable the feature flag: Feature.enable(:maven_virtual_registry)
.
Second, let's create a maven virtual registry and point it out to maven central. We can use $ curl
for that.
# create the registry object and note the id
$ curl -X POST -H "PRIVATE-TOKEN: <PAT>" "http://gdk.test:8000/api/v4/groups/<top level group id>/-/virtual_registries/packages/maven/registries?name=testing_counters"
# create the upstream and note the id
$ curl -H "PRIVATE-TOKEN: <PAT>" --data-urlencode 'url=https://repo1.maven.org/maven2' --data-urlencode 'name=upstream' -X POST http://gdk.test:8000/api/v4/virtual_registries/packages/maven/registries/<registry id>/upstreams
Last thing, to make our life easier, let's lower the period before the background job kicks in to flush the counter updates from redis to the database:
diff --git a/lib/gitlab/counters/buffered_counter.rb b/lib/gitlab/counters/buffered_counter.rb
index 8f9ad2631530..8d130000085e 100644
--- a/lib/gitlab/counters/buffered_counter.rb
+++ b/lib/gitlab/counters/buffered_counter.rb
@@ -5,8 +5,8 @@ module Counters
class BufferedCounter
include Gitlab::ExclusiveLeaseHelpers
- WORKER_DELAY = 10.minutes
- WORKER_LOCK_TTL = 10.minutes
+ WORKER_DELAY = 10.seconds
+ WORKER_LOCK_TTL = 10.seconds
# Refresh keys are set to expire after a very long time,
# so that they do not occupy Redis memory indefinitely,
Ok, we're ready.
Let's open a rails console and inspect the upstream cache entries:
VirtualRegistries::Packages::Maven::Upstream.find(<upstream id>).cache_entries
=> []
It's empty obviously, since we didn't make any request to the virtual registry. Let's do one:
$ curl --header "Private-Token: <PAT>" "http://gdk.test:8000/api/v4/virtual_registries/packages/maven/<registry id>/org/eclipse/jetty/jetty-client/12.0.19/jetty-client-12.0.19.pom"
Let's check the cache entries again (you might need to wait for the 10 seconds delay of the background job).
VirtualRegistries::Packages::Maven::Upstream.find(<upstream id>).cache_entries
=> [#<VirtualRegistries::Packages::Maven::Cache::Entry:0x0000000167315bd8
group_id: 24,
upstream_id: 236,
upstream_checked_at: Fri, 04 Jul 2025 13:33:35.488409000 UTC +00:00,
created_at: Fri, 04 Jul 2025 13:33:35.507984000 UTC +00:00,
updated_at: Fri, 04 Jul 2025 13:33:50.965496000 UTC +00:00,
file_store: 1,
size: 5495,
status: "default",
relative_path: "/org/eclipse/jetty/jetty-client/12.0.19/jetty-client-12.0.19.pom",
file: "jetty-client-12.0.19.pom",
object_storage_key: "[FILTERED]",
upstream_etag: "\"32f2a2872c289c105485403aae29cba3\"",
content_type: "[FILTERED]",
file_md5: "32f2a2872c289c105485403aae29cba3",
file_sha1: "dd567c3708a83373db990c18916dbdc42e9e30ab",
downloads_count: 1,
downloaded_at: Fri, 04 Jul 2025 13:33:50.965496000 UTC +00:00>]
Notice the downloads_count
and downladed_at
columns values are set.
Let's pull the same file a few times:
$ repeat 5 curl --header "Private-Token: <PAT>" "http://gdk.test:8000/api/v4/virtual_registries/packages/maven/<registry id>/org/eclipse/jetty/jetty-client/12.0.19/jetty-client-12.0.19.pom"
Let's check the cache entries once final time (you might need to wait for the 10 seconds delay of the background job).
VirtualRegistries::Packages::Maven::Upstream.find(<upstream id>).cache_entries
=> [#<VirtualRegistries::Packages::Maven::Cache::Entry:0x0000000105320158
group_id: 24,
upstream_id: 236,
upstream_checked_at: Fri, 04 Jul 2025 13:33:35.488409000 UTC +00:00,
created_at: Fri, 04 Jul 2025 13:33:35.507984000 UTC +00:00,
updated_at: Fri, 04 Jul 2025 13:35:33.116751000 UTC +00:00,
file_store: 1,
size: 5495,
status: "default",
relative_path: "/org/eclipse/jetty/jetty-client/12.0.19/jetty-client-12.0.19.pom",
file: "jetty-client-12.0.19.pom",
object_storage_key: "[FILTERED]",
upstream_etag: "\"32f2a2872c289c105485403aae29cba3\"",
content_type: "[FILTERED]",
file_md5: "32f2a2872c289c105485403aae29cba3",
file_sha1: "dd567c3708a83373db990c18916dbdc42e9e30ab",
downloads_count: 6,
downloaded_at: Fri, 04 Jul 2025 13:35:33.116751000 UTC +00:00>]
The downloads_count
has been bumped correctly (1 + 5) and the downloaded_at
value has been updated too.
The change is behaving the way we want
🏎️ MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.