Skip to content

Add models for Virtual Registries, part 1/2

David Fernandez requested to merge 467972-vreg-db-models-part-1 into master

🔭 Context

With Maven virtual registry (&14137), we're starting the work on Virtual Registries. Virtual Registries is a feature that could be described as the evolution of the dependency proxy idea: having the GitLab instance play man in the middle between clients and artifacts registries. Artifacts can be any kind but we're going to focus on packages and container images, starting with Maven packages specifically.

In other words, the GitLab instance can be configured to contact a set of upstreams and expose a specific virtual registry url that "talks" the artifact type API, in this case the Maven API. When a request hits this API, we'll check with the set of upstreams and the first one to answer successfully "wins". We will pull the response from that upstream, cache it in the GitLab instance and return it to the client.

The benefits are:

  • multiple upstreams are aggregated behind a single url = simpler configuration on the clients.
  • by caching requests and using those caches in subsequent (identical) requests, we improve the reliability of the system. If the related upstream is down but we have all the correct caches in GitLab, then a client pulling dependencies for a project will work.
  • dependency firewall features. The GitLab instance can do more than just caching. We could run a vulnerability existence check so that we don't allow vulnerable dependencies enter the system.

👣 First iteration's scope

The scope of this feature being quite large, we reduced it for the first iteration. Here are the main aspects:

  • Will work at (root) Group level.
  • Maven packages only.
  • Restrictions on the associations counts:
    • A (root) Group can only have 1 registry (of type Maven).
    • A (maven) registry can only have 1 upstream.

The implementation that we start here should be able to host the evolutions of those restrictions:

  • Support to have the Virtual Registry at a different level (such as Organisation).
  • Support for other package formats.
  • Support for other artifact types than packages, namely container registries.
  • Support for multiple registries.
  • Support for multiple upstreams.
    • Support for different upstream types: local vs remote.

See the detailed analysis in #457503 (comment 1949349752).

💽 Database tables and models

This MR is part of Maven Virtual Registry: Database models (#467972) which tackles the database tables and models that we will need.

classDiagram
    class Reg["VirtualRegistries::Packages::Maven::Registry"]
    class RegU["VirtualRegistries::Packages::Maven::RegistryUpstream"]
    class U["VirtualRegistries::Packages::Maven::Upstream"]
    class CR["VirtualRegistries::Packages::Maven::CachedResponse"]

    Reg "1" --> "1" RegU
    RegU "1" --> "1" U
    U "1" --> "0..*" CR

As discussed above, several associations are 1:1 for now but will be changed into 1:n in the future.

One thing to note is that, we specialize the tables by the artifact type and subtype, in this case packages and maven. This is because we want to avoid the situation that we have in the grouppackage registry, where tables packages_packages and packages_package_files holds data for packages registries for all package formats. Thus, this is similar to splitting the data by artifact type and subtype.

Moreover, some package formats can have specific settings (such as how to handle the caching part on specific requests (metadata)). It wouldn't make sense to have these settings available in package formats that don't need them (if we were using one table for all formats).

To keep this MR at reasonable size levels, it only introduces 3 tables (out of 4). The remaining table will be introduced in a follow up MR (see !157055).

🤔 What does this MR do and why?

  • Add new bounded context: ::VirtualRegistries
  • Add 3 tables:
    • VirtualRegistries::Packages::Maven::Registry
    • VirtualRegistries::Packages::Maven::RegistryUpstream
    • VirtualRegistries::Packages::Maven::Upstream
  • Add the related specs.
  • For Upstreams, add the credentials handling through an encrypted field that is marshalled with JSON. This allows us to define any field that we need. For now, we're starting with username and password which are the most encountered credentials when dealing with authentication on Maven registries.

Obviously, the entire feature is behind a feature flag but since the models are not connected to any logic (yet), the feature flag has not been introduced in this MR.

🏁 MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

🌈 Screenshots or screen recordings

🤷

How to set up and validate locally

The only way to play around here is with a rails console.

1️⃣ Registry model

# get a root group
root_group = Group.first

# make sure that it's a root one
root_group.root?
=> true

# create a subgroup
subgroup = FactoryBot.create(:group, parent: root_group)

# The registry object is the entry point of the feature and can only be linked with a root group.
# Let's try to create a registry on the subgroup:

r = ::VirtualRegistries::Packages::Maven::Registry.create!(group: subgroup)
ActiveRecord::RecordInvalid: Validation failed: Group must be a top level Group

# Create a registry on the root group:
r = ::VirtualRegistries::Packages::Maven::Registry.create!(group: root_group)
=> #<VirtualRegistries::Packages::Maven::Registry:0x00000001200f5f20 id: ....>

# We're limiting the amount maven registries per group to 1, let's check that:
::VirtualRegistries::Packages::Maven::Registry.create!(group: root_group)
ActiveRecord::RecordInvalid: Validation failed: Group has already been taken

All good! Let's see the next object: upstream

2️⃣ Upstream model

# Given that the association with the registry is handled in a join table, an upstream can be created without pointing to a registry
u = ::VirtualRegistries::Packages::Maven::Upstream.create!(group: root_group, url: "https://maven.test")

# Let's check some of its validations
# url should be correctly formatted
u.update!(url: "test")
ActiveRecord::RecordInvalid: Validation failed: Url is blocked: Only allowed schemes are http, https

u.reload

# credentials should both be set
u.update!(username: 'foo')
ActiveRecord::RecordInvalid: Validation failed: Password can't be blank

u.reload

u.update!(password: 'bar')
ActiveRecord::RecordInvalid: Validation failed: Username can't be blank

u.reload

u.update!(username: 'foo', password: 'bar')
=> true

u.reload

# The nice thing around credentials is that they are from a json structure that itself is encrypted 
u.credentials
=> {"username"=>"foo", "password"=>"bar"}

# The actual column that is persisted is encrypted_column which is the bytes output of the encryption
u.encrypted_credentials
=> "\x10\xD5R\xD0\xB1\xFE\xD5\xB2\x17\x8F\..."

All good here. Let's go for our final model, the join table.

3️⃣ RegistryUpstream model


ru = ::VirtualRegistries::Packages::Maven::RegistryUpstream.create!(group: root_group, registry: r, upstream: u)
=> #<VirtualRegistries::Packages::Maven::RegistryUpstream:0x00000001542b77a8 id:  ....>

# We don't allow multiple upstreams for the same registry

::VirtualRegistries::Packages::Maven::RegistryUpstream.create!(group: root_group, registry: r, upstream: ::VirtualRegistries::Packages::Maven::Upstream.create!(group: root_group, url: "https://maven.test2"))
ActiveRecord::RecordInvalid: Validation failed: Registry has already been taken

Model navigation

Alright, now that we have a small hierarchy, let's check the object navigation.

# Note that this call will trigger a proper `INNER JOIN` query.
r.upstream
=> #<VirtualRegistries::Packages::Maven::Upstream:0x0000000114214d90 ...>

# We don't think that this will be used but it's there:
u.registry
=> => #<VirtualRegistries::Packages::Maven::Registry:0x0000000114211690 ...>

💾 Database review

We are only introducing 3 tables. No queries as they will be added when we implement the logic that use those tables.

Migration up

main: == [advisory_lock_connection] object_id: 132080, pg_backend_pid: 7580
main: == 20240619141712 CreateVirtualRegistriesPackagesMavenRegistries: migrating ===
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- create_table(:virtual_registries_packages_maven_registries, {:if_not_exists=>true})
main:    -> 0.0055s
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- execute("ALTER TABLE virtual_registries_packages_maven_registries\nADD CONSTRAINT check_b3fbe8eb62\nCHECK ( cache_validity_hours >= 0 )\nNOT VALID;\n")
main:    -> 0.0008s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0005s
main: -- execute("ALTER TABLE virtual_registries_packages_maven_registries VALIDATE CONSTRAINT check_b3fbe8eb62;")
main:    -> 0.0006s
main: -- execute("RESET statement_timeout")
main:    -> 0.0011s
main: == 20240619141712 CreateVirtualRegistriesPackagesMavenRegistries: migrated (0.0625s) 

main: == [advisory_lock_connection] object_id: 132080, pg_backend_pid: 7580
ci: == [advisory_lock_connection] object_id: 132340, pg_backend_pid: 7582
ci: == 20240619141712 CreateVirtualRegistriesPackagesMavenRegistries: migrating ===
ci: -- transaction_open?(nil)
ci:    -> 0.0000s
ci: -- create_table(:virtual_registries_packages_maven_registries, {:if_not_exists=>true})
ci:    -> 0.0060s
ci: -- transaction_open?(nil)
ci:    -> 0.0000s
ci: -- transaction_open?(nil)
ci:    -> 0.0000s
ci: -- execute("ALTER TABLE virtual_registries_packages_maven_registries\nADD CONSTRAINT check_b3fbe8eb62\nCHECK ( cache_validity_hours >= 0 )\nNOT VALID;\n")
ci:    -> 0.0005s
ci: -- execute("SET statement_timeout TO 0")
ci:    -> 0.0007s
ci: -- execute("ALTER TABLE virtual_registries_packages_maven_registries VALIDATE CONSTRAINT check_b3fbe8eb62;")
ci:    -> 0.0005s
ci: -- execute("RESET statement_timeout")
ci:    -> 0.0005s
I, [2024-06-21T16:23:43.135074 #7158]  INFO -- : Database: 'ci', Table: 'virtual_registries_packages_maven_registries': Lock Writes
I, [2024-06-21T16:23:43.136048 #7158]  INFO -- : {:method=>"with_lock_retries", :class=>"gitlab:db:lock_writes", :message=>"Lock timeout is set", :current_iteration=>1, :lock_timeout_in_ms=>100}
I, [2024-06-21T16:23:43.136472 #7158]  INFO -- : {:method=>"with_lock_retries", :class=>"gitlab:db:lock_writes", :message=>"Migration finished", :current_iteration=>1, :lock_timeout_in_ms=>100}
ci: == 20240619141712 CreateVirtualRegistriesPackagesMavenRegistries: migrated (0.0311s) 

ci: == [advisory_lock_connection] object_id: 132340, pg_backend_pid: 7582
main: == [advisory_lock_connection] object_id: 132600, pg_backend_pid: 7585
main: == 20240619154655 CreateVirtualRegistriesPackagesMavenUpstreams: migrating ====
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- create_table(:virtual_registries_packages_maven_upstreams, {:if_not_exists=>true})
main: -- quote_column_name(:url)
main:    -> 0.0000s
main:    -> 0.0066s
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- execute("ALTER TABLE virtual_registries_packages_maven_upstreams\nADD CONSTRAINT check_b9e3bfa31a\nCHECK ( octet_length(encrypted_credentials) <= 1020 )\nNOT VALID;\n")
main:    -> 0.0005s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0004s
main: -- execute("ALTER TABLE virtual_registries_packages_maven_upstreams VALIDATE CONSTRAINT check_b9e3bfa31a;")
main:    -> 0.0004s
main: -- execute("RESET statement_timeout")
main:    -> 0.0003s
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- execute("ALTER TABLE virtual_registries_packages_maven_upstreams\nADD CONSTRAINT check_4af2999ab8\nCHECK ( octet_length(encrypted_credentials_iv) <= 1020 )\nNOT VALID;\n")
main:    -> 0.0005s
main: -- execute("ALTER TABLE virtual_registries_packages_maven_upstreams VALIDATE CONSTRAINT check_4af2999ab8;")
main:    -> 0.0005s
main: == 20240619154655 CreateVirtualRegistriesPackagesMavenUpstreams: migrated (0.0232s) 

main: == [advisory_lock_connection] object_id: 132600, pg_backend_pid: 7585
ci: == [advisory_lock_connection] object_id: 139120, pg_backend_pid: 7587
ci: == 20240619154655 CreateVirtualRegistriesPackagesMavenUpstreams: migrating ====
ci: -- transaction_open?(nil)
ci:    -> 0.0000s
ci: -- create_table(:virtual_registries_packages_maven_upstreams, {:if_not_exists=>true})
ci: -- quote_column_name(:url)
ci:    -> 0.0000s
ci:    -> 0.0077s
ci: -- transaction_open?(nil)
ci:    -> 0.0000s
ci: -- transaction_open?(nil)
ci:    -> 0.0000s
ci: -- execute("ALTER TABLE virtual_registries_packages_maven_upstreams\nADD CONSTRAINT check_b9e3bfa31a\nCHECK ( octet_length(encrypted_credentials) <= 1020 )\nNOT VALID;\n")
ci:    -> 0.0007s
ci: -- execute("SET statement_timeout TO 0")
ci:    -> 0.0004s
ci: -- execute("ALTER TABLE virtual_registries_packages_maven_upstreams VALIDATE CONSTRAINT check_b9e3bfa31a;")
ci:    -> 0.0006s
ci: -- execute("RESET statement_timeout")
ci:    -> 0.0004s
ci: -- transaction_open?(nil)
ci:    -> 0.0000s
ci: -- transaction_open?(nil)
ci:    -> 0.0000s
ci: -- execute("ALTER TABLE virtual_registries_packages_maven_upstreams\nADD CONSTRAINT check_4af2999ab8\nCHECK ( octet_length(encrypted_credentials_iv) <= 1020 )\nNOT VALID;\n")
ci:    -> 0.0005s
ci: -- execute("ALTER TABLE virtual_registries_packages_maven_upstreams VALIDATE CONSTRAINT check_4af2999ab8;")
ci:    -> 0.0005s
I, [2024-06-21T16:23:43.395581 #7158]  INFO -- : Database: 'ci', Table: 'virtual_registries_packages_maven_upstreams': Lock Writes
I, [2024-06-21T16:23:43.396336 #7158]  INFO -- : {:method=>"with_lock_retries", :class=>"gitlab:db:lock_writes", :message=>"Lock timeout is set", :current_iteration=>1, :lock_timeout_in_ms=>100}
I, [2024-06-21T16:23:43.396727 #7158]  INFO -- : {:method=>"with_lock_retries", :class=>"gitlab:db:lock_writes", :message=>"Migration finished", :current_iteration=>1, :lock_timeout_in_ms=>100}
ci: == 20240619154655 CreateVirtualRegistriesPackagesMavenUpstreams: migrated (0.0401s) 

ci: == [advisory_lock_connection] object_id: 139120, pg_backend_pid: 7587
main: == [advisory_lock_connection] object_id: 150180, pg_backend_pid: 7590
main: == 20240619192156 CreateVirtualRegistriesPackagesMavenRegistryUpstreams: migrating 
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- create_table(:virtual_registries_packages_maven_registry_upstreams, {:if_not_exists=>true})
main:    -> 0.0081s
main: == 20240619192156 CreateVirtualRegistriesPackagesMavenRegistryUpstreams: migrated (0.0146s) 

main: == [advisory_lock_connection] object_id: 150180, pg_backend_pid: 7590
ci: == [advisory_lock_connection] object_id: 152340, pg_backend_pid: 7592
ci: == 20240619192156 CreateVirtualRegistriesPackagesMavenRegistryUpstreams: migrating 
ci: -- transaction_open?(nil)
ci:    -> 0.0000s
ci: -- create_table(:virtual_registries_packages_maven_registry_upstreams, {:if_not_exists=>true})
ci:    -> 0.0065s
I, [2024-06-21T16:23:43.634128 #7158]  INFO -- : Database: 'ci', Table: 'virtual_registries_packages_maven_registry_upstreams': Lock Writes
I, [2024-06-21T16:23:43.635115 #7158]  INFO -- : {:method=>"with_lock_retries", :class=>"gitlab:db:lock_writes", :message=>"Lock timeout is set", :current_iteration=>1, :lock_timeout_in_ms=>100}
I, [2024-06-21T16:23:43.635540 #7158]  INFO -- : {:method=>"with_lock_retries", :class=>"gitlab:db:lock_writes", :message=>"Migration finished", :current_iteration=>1, :lock_timeout_in_ms=>100}
ci: == 20240619192156 CreateVirtualRegistriesPackagesMavenRegistryUpstreams: migrated (0.0252s) 

ci: == [advisory_lock_connection] object_id: 152340, pg_backend_pid: 7592

Migration down

main: == [advisory_lock_connection] object_id: 127760, pg_backend_pid: 8299
main: == 20240619192156 CreateVirtualRegistriesPackagesMavenRegistryUpstreams: reverting 
main: -- drop_table(:virtual_registries_packages_maven_registry_upstreams)
main:    -> 0.0038s
main: == 20240619192156 CreateVirtualRegistriesPackagesMavenRegistryUpstreams: reverted (0.0067s) 

main: == [advisory_lock_connection] object_id: 127760, pg_backend_pid: 8299
ci: == [advisory_lock_connection] object_id: 127660, pg_backend_pid: 8695
ci: == 20240619192156 CreateVirtualRegistriesPackagesMavenRegistryUpstreams: reverting 
ci: -- drop_table(:virtual_registries_packages_maven_registry_upstreams)
ci:    -> 0.0026s
ci: == 20240619192156 CreateVirtualRegistriesPackagesMavenRegistryUpstreams: reverted (0.0097s) 

ci: == [advisory_lock_connection] object_id: 127660, pg_backend_pid: 8695

main: == [advisory_lock_connection] object_id: 127660, pg_backend_pid: 9098
main: == 20240619154655 CreateVirtualRegistriesPackagesMavenUpstreams: reverting ====
main: -- drop_table(:virtual_registries_packages_maven_upstreams)
main:    -> 0.0043s
main: == 20240619154655 CreateVirtualRegistriesPackagesMavenUpstreams: reverted (0.0074s) 

main: == [advisory_lock_connection] object_id: 127660, pg_backend_pid: 9098
ci: == [advisory_lock_connection] object_id: 127660, pg_backend_pid: 9485
ci: == 20240619154655 CreateVirtualRegistriesPackagesMavenUpstreams: reverting ====
ci: -- drop_table(:virtual_registries_packages_maven_upstreams)
ci:    -> 0.0022s
ci: == 20240619154655 CreateVirtualRegistriesPackagesMavenUpstreams: reverted (0.0089s) 

ci: == [advisory_lock_connection] object_id: 127660, pg_backend_pid: 9485

main: == [advisory_lock_connection] object_id: 127660, pg_backend_pid: 9893
main: == 20240619141712 CreateVirtualRegistriesPackagesMavenRegistries: reverting ===
main: -- drop_table(:virtual_registries_packages_maven_registries)
main:    -> 0.0048s
main: == 20240619141712 CreateVirtualRegistriesPackagesMavenRegistries: reverted (0.0085s) 

main: == [advisory_lock_connection] object_id: 127660, pg_backend_pid: 9893
ci: == [advisory_lock_connection] object_id: 127660, pg_backend_pid: 10287
ci: == 20240619141712 CreateVirtualRegistriesPackagesMavenRegistries: reverting ===
ci: -- drop_table(:virtual_registries_packages_maven_registries)
ci:    -> 0.0041s
ci: == 20240619141712 CreateVirtualRegistriesPackagesMavenRegistries: reverted (0.0117s) 

ci: == [advisory_lock_connection] object_id: 127660, pg_backend_pid: 10287
Edited by David Fernandez

Merge request reports