Add tracking for the Maven dependency proxy
🔭 Context
We're working on the very first version of the dependency proxy for packages. See #407460 (comment 1373731852) for all the details from the technical investigation.
At the core, the concept is right simple. GitLab will act with as a proxy. Basically, users can pull packages through it and GitLab will be a pull-through cache.
Package Manager clients (npm, maven, ...) <-> GitLab <-> External Package Registry
Because, GitLab is in the middle (aka proxy) of the package transport, we can leverage the GitLab Package registry to use it as a cache. In other words, before contacting the external package registry, we can check the local project registry to check if the package is already there. If that's the case, we can return it directly.
The first package format to get support is Maven.
In this MR, we want to track the usage of the Maven dependency proxy. Here are the aspects we want to track:
- The amount of files pulled through the Maven dependency proxy and served from the external package registry. Basically, a cache miss. This is an Internal event.
- The amount of files pulled through the Maven dependency proxy and served from the GitLab package registry. Basically, a cache hit. This is an Internal event.
- The amount of projects that have enabled the Maven dependency proxy with a valid configuration. This is a new count for Service Ping.
The related issue is Instrument data to help measure adoption of the... (#431412 - closed)
🤔 What does this MR do and why?
- Trigger the related tracking event when it's a dependency proxy cache miss.
- Trigger the related tracking event when it's a dependency proxy cache hit.
- Add a count (new instrumentation class) for the number of projects with the Maven dependency proxy set up and enabled in Service Ping.
- Update all related specs.
The entire Maven dependency proxy is still behind a feature flag at the time of this writing (not delivered yet). Thus no changelog on this MR.
📺 Screenshots or screen recordings
No UI changes
⚙ How to set up and validate locally
Test setup is a bit involved but here it is. We're going to set up a dummy registry server locally, then configure the local GitLab instance to point to that dummy server. Lastly, we will use $ curl
to "simulate" requests done by the actual Maven clients. Those requests should trigger the events this MR adds.
🦋 The dummy registry server set up
We just need a server that serves a dummy file.
In a brand new folder:
mkdir -p srv/com/my/company/1.2.3
cd srv/com/my/company/1.2.3
echo bananas! > test.txt
cd ../../../..
ruby -run -e httpd . -p 8081
🦊 GitLab server set up
The Maven dependency proxy for packages has a few requirements:
- have
packages
->enabled
set totrue
ingitlab.yml
. - have
dependency_proxy
->enabled
set totrue
ingitlab.yml
. - have the
packages
feature enabled in the project's settings.Settings
->General
->Visiblity, project features, permissions
->Package registry
(checkbox enabled.) - have a GitLab license.
Premium
or more. - have the related feature flag turned on:
Feature.enable(:packages_dependency_proxy_maven)
Next, let's configure our local GitLab:
-
Have a private project ready.
-
Have a PAT ready (with scope
api
). You need two users:maintainer
+ or more and areporter
. -
Let's setup the dependency proxy settings in rails console:
Project.find(<project_id>).create_dependency_proxy_packages_setting!(enabled: true, maven_external_registry_url: 'http://<local IP>:8081')
Set up snowplow with https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/snowplow_micro.md. Make sure that http://localhost:9091/micro/all
is working.
With GitLab runner, open several terminals and:
rails runner scripts/internal_events/monitor.rb dependency_proxy_packages_maven_file_pulled
rails runner scripts/internal_events/monitor.rb dependency_proxy_packages_maven_file_pulled_from_cache
1️⃣ First pull
Pull the file with:
$ curl "http://<username>:<pat>@gdk.test:8000/api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt"
This is the first pull so the file doesn't exist in cache and is thus returned without using it. On terminal (2.), you shouldn't see any event (because the cache has not been used). On terminal (1.), you should see:
+------------------------------------------------------------------------------------------------------------------------+
| SNOWPLOW EVENTS |
+---------------------------------------------+--------------------------+---------+--------------+------------+---------+
| Event Name | Collector Timestamp | user_id | namespace_id | project_id | plan |
+---------------------------------------------+--------------------------+---------+--------------+------------+---------+
| dependency_proxy_packages_maven_file_pulled | 2023-11-24T12:42:02.050Z | 1 | 1 | 241 | default |
+---------------------------------------------+--------------------------+---------+--------------+------------+---------+
2️⃣ Subsequent pulls
Now, we will pull the file again but this time around, the cache entry will be located and used. Terminal (1.) will not have anything but terminal (2.) will have:
+-----------------------------------------------------------------------------------------------------------------------------------+
| SNOWPLOW EVENTS |
+--------------------------------------------------------+--------------------------+---------+--------------+------------+---------+
| Event Name | Collector Timestamp | user_id | namespace_id | project_id | plan |
+--------------------------------------------------------+--------------------------+---------+--------------+------------+---------+
| dependency_proxy_packages_maven_file_pulled_from_cache | 2023-11-24T12:44:01.688Z | 1 | 1 | 241 | default |
+--------------------------------------------------------+--------------------------+---------+--------------+------------+---------+
Both events are working as expected
3️⃣ Service Ping
Let's check service ping. We set up a project with the maven dependency proxy, so we should have a count of 1
. In a rails console, let's create the service ping payload and check it:
Gitlab::UsageDataMetrics.uncached_data[:counts][:projects_with_dependency_proxy_for_maven_packages]
=> 1
All good
🛃 MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
💾 Database review
🔼 Migration up
$ rails db:migrate
main: == [advisory_lock_connection] object_id: 184100, pg_backend_pid: 72839
main: == 20231124134838 AddIndexDependencyProxyPackageSettingsEnabledForMaven: migrating
main: -- transaction_open?(nil)
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.0734s
main: -- index_exists?(:dependency_proxy_packages_settings, :project_id, {:name=>"idx_dep_proxy_pkgs_settings_enabled_maven_on_project_id", :where=>"enabled = TRUE AND maven_external_registry_url IS NOT NULL", :algorithm=>:concurrently})
main: -> 0.0013s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0001s
main: -- add_index(:dependency_proxy_packages_settings, :project_id, {:name=>"idx_dep_proxy_pkgs_settings_enabled_maven_on_project_id", :where=>"enabled = TRUE AND maven_external_registry_url IS NOT NULL", :algorithm=>:concurrently})
main: -> 0.0058s
main: -- execute("RESET statement_timeout")
main: -> 0.0001s
main: == 20231124134838 AddIndexDependencyProxyPackageSettingsEnabledForMaven: migrated (0.0933s)
main: == [advisory_lock_connection] object_id: 184100, pg_backend_pid: 72839
ci: == [advisory_lock_connection] object_id: 184340, pg_backend_pid: 72841
ci: == 20231124134838 AddIndexDependencyProxyPackageSettingsEnabledForMaven: migrating
ci: -- transaction_open?(nil)
ci: -> 0.0000s
ci: -- view_exists?(:postgres_partitions)
ci: -> 0.0005s
ci: -- index_exists?(:dependency_proxy_packages_settings, :project_id, {:name=>"idx_dep_proxy_pkgs_settings_enabled_maven_on_project_id", :where=>"enabled = TRUE AND maven_external_registry_url IS NOT NULL", :algorithm=>:concurrently})
ci: -> 0.0014s
ci: -- execute("SET statement_timeout TO 0")
ci: -> 0.0002s
ci: -- add_index(:dependency_proxy_packages_settings, :project_id, {:name=>"idx_dep_proxy_pkgs_settings_enabled_maven_on_project_id", :where=>"enabled = TRUE AND maven_external_registry_url IS NOT NULL", :algorithm=>:concurrently})
ci: -> 0.0064s
ci: -- execute("RESET statement_timeout")
ci: -> 0.0001s
ci: == 20231124134838 AddIndexDependencyProxyPackageSettingsEnabledForMaven: migrated (0.0237s)
ci: == [advisory_lock_connection] object_id: 184340, pg_backend_pid: 72841
⬇ Migration down
$ r db:rollback:main
main: == [advisory_lock_connection] object_id: 183760, pg_backend_pid: 73478
main: == 20231124134838 AddIndexDependencyProxyPackageSettingsEnabledForMaven: reverting
main: -- transaction_open?(nil)
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.1085s
main: -- indexes(:dependency_proxy_packages_settings)
main: -> 0.0026s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0001s
main: -- remove_index(:dependency_proxy_packages_settings, {:algorithm=>:concurrently, :name=>"idx_dep_proxy_pkgs_settings_enabled_maven_on_project_id"})
main: -> 0.0019s
main: -- execute("RESET statement_timeout")
main: -> 0.0001s
main: == 20231124134838 AddIndexDependencyProxyPackageSettingsEnabledForMaven: reverted (0.1245s)
main: == [advisory_lock_connection] object_id: 183760, pg_backend_pid: 73478