Skip to content

Add tracking for the Maven dependency proxy

David Fernandez requested to merge 431412-instrument-maven-dependency-proxy into master

🔭 Context

We're working on the very first version of the dependency proxy for packages. See #407460 (comment 1373731852) for all the details from the technical investigation.

At the core, the concept is right simple. GitLab will act with as a proxy. Basically, users can pull packages through it and GitLab will be a pull-through cache.

Package Manager clients (npm, maven, ...) <-> GitLab <-> External Package Registry

Because, GitLab is in the middle (aka proxy) of the package transport, we can leverage the GitLab Package registry to use it as a cache. In other words, before contacting the external package registry, we can check the local project registry to check if the package is already there. If that's the case, we can return it directly.

The first package format to get support is Maven.

In this MR, we want to track the usage of the Maven dependency proxy. Here are the aspects we want to track:

  1. The amount of files pulled through the Maven dependency proxy and served from the external package registry. Basically, a cache miss. This is an Internal event.
  2. The amount of files pulled through the Maven dependency proxy and served from the GitLab package registry. Basically, a cache hit. This is an Internal event.
  3. The amount of projects that have enabled the Maven dependency proxy with a valid configuration. This is a new count for Service Ping.

The related issue is Instrument data to help measure adoption of the... (#431412 - closed)

🤔 What does this MR do and why?

  • Trigger the related tracking event when it's a dependency proxy cache miss.
  • Trigger the related tracking event when it's a dependency proxy cache hit.
  • Add a count (new instrumentation class) for the number of projects with the Maven dependency proxy set up and enabled in Service Ping.
  • Update all related specs.

The entire Maven dependency proxy is still behind a feature flag at the time of this writing (not delivered yet). Thus no changelog on this MR.

📺 Screenshots or screen recordings

No UI changes 🦄

How to set up and validate locally

Test setup is a bit involved but here it is. We're going to set up a dummy registry server locally, then configure the local GitLab instance to point to that dummy server. Lastly, we will use $ curl to "simulate" requests done by the actual Maven clients. Those requests should trigger the events this MR adds.

🦋 The dummy registry server set up

We just need a server that serves a dummy file.

In a brand new folder:

  1. mkdir -p srv/com/my/company/1.2.3
  2. cd srv/com/my/company/1.2.3
  3. echo bananas! > test.txt
  4. cd ../../../..
  5. ruby -run -e httpd . -p 8081

🦊 GitLab server set up

The Maven dependency proxy for packages has a few requirements:

  1. have packages -> enabled set to true in gitlab.yml.
  2. have dependency_proxy -> enabled set to true in gitlab.yml.
  3. have the packages feature enabled in the project's settings. Settings -> General -> Visiblity, project features, permissions -> Package registry (checkbox enabled.)
  4. have a GitLab license. Premium or more.
  5. have the related feature flag turned on:
    Feature.enable(:packages_dependency_proxy_maven)

Next, let's configure our local GitLab:

  1. Have a private project ready.

  2. Have a PAT ready (with scope api). You need two users: maintainer+ or more and a reporter.

  3. Let's setup the dependency proxy settings in rails console:

    Project.find(<project_id>).create_dependency_proxy_packages_setting!(enabled: true, maven_external_registry_url: 'http://<local IP>:8081')

Set up snowplow with https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/snowplow_micro.md. Make sure that http://localhost:9091/micro/all is working.

With GitLab runner, open several terminals and:

  1. rails runner scripts/internal_events/monitor.rb dependency_proxy_packages_maven_file_pulled
  2. rails runner scripts/internal_events/monitor.rb dependency_proxy_packages_maven_file_pulled_from_cache

1️⃣ First pull

Pull the file with:

$ curl "http://<username>:<pat>@gdk.test:8000/api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt"

This is the first pull so the file doesn't exist in cache and is thus returned without using it. On terminal (2.), you shouldn't see any event (because the cache has not been used). On terminal (1.), you should see:

+------------------------------------------------------------------------------------------------------------------------+
|                                                    SNOWPLOW EVENTS                                                     |
+---------------------------------------------+--------------------------+---------+--------------+------------+---------+
| Event Name                                  | Collector Timestamp      | user_id | namespace_id | project_id | plan    |
+---------------------------------------------+--------------------------+---------+--------------+------------+---------+
| dependency_proxy_packages_maven_file_pulled | 2023-11-24T12:42:02.050Z | 1       | 1            | 241        | default |
+---------------------------------------------+--------------------------+---------+--------------+------------+---------+

2️⃣ Subsequent pulls

Now, we will pull the file again but this time around, the cache entry will be located and used. Terminal (1.) will not have anything but terminal (2.) will have:

+-----------------------------------------------------------------------------------------------------------------------------------+
|                                                          SNOWPLOW EVENTS                                                          |
+--------------------------------------------------------+--------------------------+---------+--------------+------------+---------+
| Event Name                                             | Collector Timestamp      | user_id | namespace_id | project_id | plan    |
+--------------------------------------------------------+--------------------------+---------+--------------+------------+---------+
| dependency_proxy_packages_maven_file_pulled_from_cache | 2023-11-24T12:44:01.688Z | 1       | 1            | 241        | default |
+--------------------------------------------------------+--------------------------+---------+--------------+------------+---------+

Both events are working as expected

3️⃣ Service Ping

Let's check service ping. We set up a project with the maven dependency proxy, so we should have a count of 1. In a rails console, let's create the service ping payload and check it:

Gitlab::UsageDataMetrics.uncached_data[:counts][:projects_with_dependency_proxy_for_maven_packages]
=> 1

All good

🛃 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

💾 Database review

🔼 Migration up

$ rails db:migrate
main: == [advisory_lock_connection] object_id: 184100, pg_backend_pid: 72839
main: == 20231124134838 AddIndexDependencyProxyPackageSettingsEnabledForMaven: migrating 
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0734s
main: -- index_exists?(:dependency_proxy_packages_settings, :project_id, {:name=>"idx_dep_proxy_pkgs_settings_enabled_maven_on_project_id", :where=>"enabled = TRUE AND maven_external_registry_url IS NOT NULL", :algorithm=>:concurrently})
main:    -> 0.0013s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0001s
main: -- add_index(:dependency_proxy_packages_settings, :project_id, {:name=>"idx_dep_proxy_pkgs_settings_enabled_maven_on_project_id", :where=>"enabled = TRUE AND maven_external_registry_url IS NOT NULL", :algorithm=>:concurrently})
main:    -> 0.0058s
main: -- execute("RESET statement_timeout")
main:    -> 0.0001s
main: == 20231124134838 AddIndexDependencyProxyPackageSettingsEnabledForMaven: migrated (0.0933s) 

main: == [advisory_lock_connection] object_id: 184100, pg_backend_pid: 72839
ci: == [advisory_lock_connection] object_id: 184340, pg_backend_pid: 72841
ci: == 20231124134838 AddIndexDependencyProxyPackageSettingsEnabledForMaven: migrating 
ci: -- transaction_open?(nil)
ci:    -> 0.0000s
ci: -- view_exists?(:postgres_partitions)
ci:    -> 0.0005s
ci: -- index_exists?(:dependency_proxy_packages_settings, :project_id, {:name=>"idx_dep_proxy_pkgs_settings_enabled_maven_on_project_id", :where=>"enabled = TRUE AND maven_external_registry_url IS NOT NULL", :algorithm=>:concurrently})
ci:    -> 0.0014s
ci: -- execute("SET statement_timeout TO 0")
ci:    -> 0.0002s
ci: -- add_index(:dependency_proxy_packages_settings, :project_id, {:name=>"idx_dep_proxy_pkgs_settings_enabled_maven_on_project_id", :where=>"enabled = TRUE AND maven_external_registry_url IS NOT NULL", :algorithm=>:concurrently})
ci:    -> 0.0064s
ci: -- execute("RESET statement_timeout")
ci:    -> 0.0001s
ci: == 20231124134838 AddIndexDependencyProxyPackageSettingsEnabledForMaven: migrated (0.0237s) 

ci: == [advisory_lock_connection] object_id: 184340, pg_backend_pid: 72841

Migration down

$ r db:rollback:main
main: == [advisory_lock_connection] object_id: 183760, pg_backend_pid: 73478
main: == 20231124134838 AddIndexDependencyProxyPackageSettingsEnabledForMaven: reverting 
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.1085s
main: -- indexes(:dependency_proxy_packages_settings)
main:    -> 0.0026s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0001s
main: -- remove_index(:dependency_proxy_packages_settings, {:algorithm=>:concurrently, :name=>"idx_dep_proxy_pkgs_settings_enabled_maven_on_project_id"})
main:    -> 0.0019s
main: -- execute("RESET statement_timeout")
main:    -> 0.0001s
main: == 20231124134838 AddIndexDependencyProxyPackageSettingsEnabledForMaven: reverted (0.1245s) 

main: == [advisory_lock_connection] object_id: 183760, pg_backend_pid: 73478

🔢 Queries

Edited by David Fernandez

Merge request reports