Skip to content

Package build infos: remove safe_find_or_create_by! calls

🔥 Context

In #338346 (closed), it has been discovered that create PG sub transactions can have a non trivial ~performance hit.

One of the source of those sub transactions is safe_find_or_create_by!.

As such, #338954 (closed) has been opened to remove such calls from the build info models used in the Package code area.

We're going to replace such calls with its rails cousin find_or_create_by!. That operation is not atomic. The handbook has even a warning about that. To understand why we can do so, let's look at what is build info model.

The build info model is what I personally call a "bridge". The model itself doesn't have any column that transports data instead the model is used to bridge two other models together (the advantage is that these two other tables are not modified to create a link). Here, the model is clear: we want to have a link between a Packages::Package and a Ci::Pipeline.

That link is created only when a package is uploaded to the GitLab package registry by a ci job. Due to how the build info is inserted (using .safe_find_or_create_by!), it is clear that we want a unique tuple package_id + pipeline_id to exist in the database.

For one reason, that uniqueness constraint is neither declared in the database nor it is declared in the rails model. We're going to use that to our advantage here. By not having any constraint on the build info table, find_or_create_by! can never fails even if it is not atomic. The only issue is that due to the race condition, we can have duplicates. That problem is already present in the table (See !68649 (comment 656594885)).

This "duplicates" problem can could from earlier implementations where a simple create! was used. We can't do that because some package managers can upload multiple files for the same package (name + version). Example: an maven package is a set of files that are uploaded sequentially by $ mvn (or $ gradle). If we use a create! and the maven package is pushed by a ci pipeline, we will have multiple build info rows pointing to the same package + pipeline tuple.

So we have two things at hand:

  1. Remove the .safe_find_or_create_by! call. This is handled by this MR.
  2. Fix the unique "constraint" violations in the table packages_build_infos. This was extracted as a follow up. See #339093.

Notice that during the implementation of (2.), we're going to add constraints to the table so, the find_or_create_by! call will need to be updated to something else (probably an upsert).

🔬 What does this MR do?

  • Create a new function on Packages::Package to create the build info using find_or_create_by!.
  • Use that function instead of .safe_find_or_create_by! in app/services/packages/generic/create_package_file_service.rb and app/services/packages/maven/find_or_create_package_service.rb.
  • Update / Add the related specs.
    • Add a spec example to prove that when multiple files are pushed in sequence with the same build information, a single build info row is inserted.

Screenshots or Screencasts (strongly suggested)

Pushing a generic package on master

SQL traces (extract)
Namespace::PackageSetting Load (0.3ms)  SELECT "namespace_package_settings".* FROM "namespace_package_settings" WHERE "namespace_package_settings"."namespace_id" = 1 LIMIT 1
  ↳ app/models/namespace.rb:196:in `package_settings'
Ci::Pipeline Load (0.5ms)  SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."id" = 184 LIMIT 1
  ↳ app/services/packages/generic/create_package_file_service.rb:32:in `find_or_create_package'
TRANSACTION (0.3ms)  SAVEPOINT active_record_1
  ↳ app/models/application_record.rb:69:in `block in safe_find_or_create_by'
Packages::BuildInfo Load (0.3ms)  SELECT "packages_build_infos".* FROM "packages_build_infos" WHERE "packages_build_infos"."package_id" = 335 AND "packages_build_infos"."pipeline_id" = 184 LIMIT 1
  ↳ app/models/application_record.rb:69:in `block in safe_find_or_create_by'
Packages::BuildInfo Create (0.6ms)  INSERT INTO "packages_build_infos" ("package_id", "pipeline_id") VALUES (335, 184) RETURNING "id"
  ↳ lib/gitlab/database.rb:214:in `block in transaction'
TRANSACTION (0.2ms)  RELEASE SAVEPOINT active_record_1
  ↳ lib/gitlab/database.rb:214:in `block in transaction'
Packages::PackageFile Create (0.6ms)  INSERT INTO "packages_package_files" ("package_id", "created_at", "updated_at", "size", "file_name", "file", "file_sha256") VALUES (335, '2021-08-20 13:12:04.183600', '2021-08-20 13:12:04.183600', 8, 'dummyfile.txt', 'dummyfile.txt', '\x66633434656235656439353361646661366230323832353637393037383365616566373365613739323834313438303435353765393866386333653435363865') RETURNING "id"
  ↳ app/services/packages/create_package_file_service.rb:25:in `execute'
Packages::PackageFileBuildInfo Create (0.5ms)  INSERT INTO "packages_package_file_build_infos" ("package_file_id", "pipeline_id") VALUES (401, 184) RETURNING "id"
  ↳ app/services/packages/create_package_file_service.rb:25:in `execute'
Packages::PackageFile Update (0.5ms)  UPDATE "packages_package_files" SET "file_store" = 1 WHERE "packages_package_files"."id" = 401
  ↳ app/models/concerns/file_store_mounter.rb:19:in `update_file_store'
TRANSACTION (0.5ms)  COMMIT
  ↳ lib/gitlab/database.rb:235:in `commit'
ProjectStatistics Load (0.4ms)  SELECT "project_statistics".* FROM "project_statistics" WHERE "project_statistics"."project_id" = 43 LIMIT 1
  ↳ app/models/project_statistics.rb:118:in `increment_statistic'

Pushing a generic package using this branch

SQL traces (extract)
Namespace::PackageSetting Load (0.8ms)  SELECT "namespace_package_settings".* FROM "namespace_package_settings" WHERE "namespace_package_settings"."namespace_id" = 1 LIMIT 1
  ↳ app/models/namespace.rb:196:in `package_settings'
Project Load (0.7ms)  SELECT "projects".* FROM "projects" WHERE "projects"."id" = 43 LIMIT 1
  ↳ lib/api/ci/helpers/runner.rb:67:in `authenticate_job!'
Ci::Runner Load (1.8ms)  SELECT "ci_runners".* FROM "ci_runners" WHERE "ci_runners"."id" = 1 LIMIT 1
  ↳ lib/api/ci/helpers/runner.rb:74:in `authenticate_job!'
Ci::Pipeline Load (2.7ms)  SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."id" = 187 LIMIT 1
  ↳ app/models/packages/package.rb:295:in `create_build_infos!'
Namespace Load (0.5ms)  SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."id" = 1 LIMIT 1
  ↳ app/models/project.rb:438:in `actual_limits'
Packages::BuildInfo Load (0.7ms)  SELECT "packages_build_infos".* FROM "packages_build_infos" WHERE "packages_build_infos"."package_id" = 335 AND "packages_build_infos"."pipeline_id" = 187 LIMIT 1
  ↳ app/models/packages/package.rb:298:in `create_build_infos!'
Packages::BuildInfo Create (0.5ms)  INSERT INTO "packages_build_infos" ("package_id", "pipeline_id") VALUES (335, 187) RETURNING "id"
  ↳ lib/gitlab/database.rb:214:in `block in transaction'
GitlabSubscription Load (0.3ms)  SELECT "gitlab_subscriptions".* FROM "gitlab_subscriptions" WHERE "gitlab_subscriptions"."namespace_id" = 1 LIMIT 1
  ↳ ee/app/models/ee/namespace.rb:458:in `find_or_create_subscription'
Plan Load (0.4ms)  SELECT "plans".* FROM "plans" WHERE "plans"."name" = 'default' LIMIT 1
Packages::PackageFile Create (0.6ms)  INSERT INTO "packages_package_files" ("package_id", "created_at", "updated_at", "size", "file_name", "file", "file_sha256") VALUES (335, '2021-08-23 12:11:31.018168', '2021-08-23 12:11:31.018168', 8, 'dummyfile.txt', 'dummyfile.txt', '\x66633434656235656439353361646661366230323832353637393037383365616566373365613739323834313438303435353765393866386333653435363865') RETURNING "id"
  ↳ app/models/plan.rb:17:in `block in default'
  ↳ app/services/packages/create_package_file_service.rb:25:in `execute'
PlanLimits Load (0.6ms)  SELECT "plan_limits".* FROM "plan_limits" WHERE "plan_limits"."plan_id" = 1 LIMIT 1
  ↳ app/models/plan.rb:30:in `actual_limits'
Packages::PackageFileBuildInfo Create (1.4ms)  INSERT INTO "packages_package_file_build_infos" ("package_file_id", "pipeline_id") VALUES (413, 187) RETURNING "id"
  ↳ app/services/packages/create_package_file_service.rb:25:in `execute'
Packages::PackageFile Update (0.4ms)  UPDATE "packages_package_files" SET "file_store" = 1 WHERE "packages_package_files"."id" = 413
  ↳ app/models/concerns/file_store_mounter.rb:19:in `update_file_store'
TRANSACTION (0.7ms)  COMMIT
  ↳ lib/gitlab/database.rb:235:in `commit'
Ci::JobArtifact Load (1.5ms)  SELECT "ci_job_artifacts".* FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."job_id" = 1224 AND "ci_job_artifacts"."file_type" = 3 LIMIT 1
ProjectStatistics Load (1.1ms)  SELECT "project_statistics".* FROM "project_statistics" WHERE "project_statistics"."project_id" = 43 LIMIT 1
  ↳ lib/gitlab/ci/trace.rb:298:in `block in trace_artifact'
  ↳ app/models/project_statistics.rb:118:in `increment_statistic'
ProjectStatistics Update All (0.6ms)  UPDATE "project_statistics" SET packages_size = COALESCE(packages_size, 0) + (8), storage_size = COALESCE(storage_size, 0) + (8) WHERE "project_statistics"."project_id" = 43
  ↳ app/models/project_statistics.rb:146:in `columns_to_increment'

Pushing a maven package on master

SQL traces (extract)
Namespace::PackageSetting Load (1.2ms)  SELECT "namespace_package_settings".* FROM "namespace_package_settings" WHERE "namespace_package_settings"."namespace_id" = 1 LIMIT 1
  ↳ app/models/namespace.rb:196:in `package_settings'
Ci::Pipeline Load (0.8ms)  SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."id" = 184 LIMIT 1
  ↳ app/services/packages/maven/find_or_create_package_service.rb:54:in `execute'
TRANSACTION (0.3ms)  BEGIN
  ↳ app/models/application_record.rb:69:in `block in safe_find_or_create_by'
Packages::BuildInfo Load (2.7ms)  SELECT "packages_build_infos".* FROM "packages_build_infos" WHERE "packages_build_infos"."package_id" = 336 AND "packages_build_infos"."pipeline_id" = 184 LIMIT 1
  ↳ app/models/application_record.rb:69:in `block in safe_find_or_create_by'
Packages::BuildInfo Create (3.2ms)  INSERT INTO "packages_build_infos" ("package_id", "pipeline_id") VALUES (336, 184) RETURNING "id"
  ↳ lib/gitlab/database.rb:214:in `block in transaction'
TRANSACTION (0.7ms)  COMMIT
  ↳ lib/gitlab/database.rb:235:in `commit'
License Load (0.5ms)  SELECT "licenses".* FROM "licenses" ORDER BY "licenses"."id" DESC LIMIT 100
  ↳ ee/app/models/license.rb:317:in `load_license'
Feature::FlipperGate Load (0.5ms)  SELECT "feature_gates".* FROM "feature_gates" WHERE "feature_gates"."feature_key" = 'collect_package_events'
  ↳ lib/feature.rb:84:in `enabled?'
TRANSACTION (0.3ms)  BEGIN
  ↳ app/services/packages/create_package_file_service.rb:25:in `execute'
Packages::PackageFile Create (8.7ms)  INSERT INTO "packages_package_files" ("package_id", "created_at", "updated_at", "size", "file_md5", "file_sha1", "file_name", "file") VALUES (336, '2021-08-20 13:11:16.641581', '2021-08-20 13:11:16.641581', 2206, '\x3332383262333261313766656633623931626439623430313163323230653134', '\x61383961383730363236306466303462383632613631643630663938313166353165626533626463', 'mvnbananas-2.4.7.jar', 'mvnbananas-2.4.7.jar') RETURNING "id"
  ↳ app/services/packages/create_package_file_service.rb:25:in `execute'
Packages::PackageFileBuildInfo Create (5.3ms)  INSERT INTO "packages_package_file_build_infos" ("package_file_id", "pipeline_id") VALUES (398, 184) RETURNING "id"
  ↳ app/services/packages/create_package_file_service.rb:25:in `execute'
Packages::PackageFile Update (2.2ms)  UPDATE "packages_package_files" SET "file_store" = 1 WHERE "packages_package_files"."id" = 398
  ↳ app/models/concerns/file_store_mounter.rb:19:in `update_file_store'
TRANSACTION (1.3ms)  COMMIT
  ↳ lib/gitlab/database.rb:235:in `commit'
ProjectStatistics Load (0.7ms)  SELECT "project_statistics".* FROM "project_statistics" WHERE "project_statistics"."project_id" = 43 LIMIT 1
  ↳ app/models/project_statistics.rb:118:in `increment_statistic'

Pushing a maven package using this branch

SQL traces (extract)
Namespace::PackageSetting Load (0.7ms)  SELECT "namespace_package_settings".* FROM "namespace_package_settings" WHERE "namespace_package_settings"."namespace_id" = 1 LIMIT 1
  ↳ app/models/namespace.rb:196:in `package_settings'
Ci::Pipeline Load (3.3ms)  SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."id" = 187 LIMIT 1
  ↳ app/models/packages/package.rb:295:in `create_build_infos!'
Packages::BuildInfo Load (1.1ms)  SELECT "packages_build_infos".* FROM "packages_build_infos" WHERE "packages_build_infos"."package_id" = 336 AND "packages_build_infos"."pipeline_id" = 187 LIMIT 1
  ↳ app/models/packages/package.rb:298:in `create_build_infos!'
TRANSACTION (0.2ms)  BEGIN
  ↳ lib/gitlab/database.rb:214:in `block in transaction'
Packages::BuildInfo Create (0.6ms)  INSERT INTO "packages_build_infos" ("package_id", "pipeline_id") VALUES (336, 187) RETURNING "id"
  ↳ lib/gitlab/database.rb:214:in `block in transaction'
TRANSACTION (0.7ms)  COMMIT
  ↳ lib/gitlab/database.rb:235:in `commit'
License Load (0.7ms)  SELECT "licenses".* FROM "licenses" ORDER BY "licenses"."id" DESC LIMIT 100
  ↳ ee/app/models/license.rb:317:in `load_license'
TRANSACTION (0.4ms)  BEGIN
  ↳ app/services/packages/create_package_file_service.rb:25:in `execute'
Packages::PackageFile Create (0.8ms)  INSERT INTO "packages_package_files" ("package_id", "created_at", "updated_at", "size", "file_md5", "file_sha1", "file_name", "file") VALUES (336, '2021-08-23 12:10:55.401405', '2021-08-23 12:10:55.401405', 2207, '\x3233653238656433393931346630666234653935343739396464396538663532', '\x31353435303038663239393066333730643664386262613335303434663533653565363830393336', 'mvnbananas-2.4.7.jar', 'mvnbananas-2.4.7.jar') RETURNING "id"
  ↳ app/services/packages/create_package_file_service.rb:25:in `execute'
Packages::PackageFileBuildInfo Create (1.9ms)  INSERT INTO "packages_package_file_build_infos" ("package_file_id", "pipeline_id") VALUES (410, 187) RETURNING "id"
  ↳ app/services/packages/create_package_file_service.rb:25:in `execute'
Packages::PackageFile Update (0.8ms)  UPDATE "packages_package_files" SET "file_store" = 1 WHERE "packages_package_files"."id" = 410
  ↳ app/models/concerns/file_store_mounter.rb:19:in `update_file_store'
TRANSACTION (0.4ms)  COMMIT
  ↳ lib/gitlab/database.rb:235:in `commit'
ProjectStatistics Load (1.4ms)  SELECT "project_statistics".* FROM "project_statistics" WHERE "project_statistics"."project_id" = 43 LIMIT 1
  ↳ app/models/project_statistics.rb:118:in `increment_statistic'

Notes

For generic packages, this MR properly removes the SAVEPOINT statement

For maven packages, it is strange that we don't have any SAVEPOINT statement in the SQL queries from master. Instead, we see, two transactions. This MR doesn't have an impact on that behavior. I think the explanation here is that the maven service is lacking a overall transaction for all operations (like inserting the build infos for the package and the build infos for the package files). This is dangerous, we could be inserting half of the data and not the other half. We need a transaction that includes all write operations. This being outside of the scope of this MR, I open #339112 for that.

How to setup and validate locally (strongly suggested)

  1. Have a gitlab instance with a runner

  2. Create a new project (any visibility)

  3. Add a pom.xml file with: ( replace the gitlab base url and project with the proper values)

    Content
    <?xml version="1.0" encoding="UTF-8"?>
    
    
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
      <modelVersion>4.0.0</modelVersion>
      <groupId>gl.pru</groupId>
      <artifactId>mvnbananas</artifactId>
      <packaging>jar</packaging>
      <version>2.4.7</version>
      <name>mvnbananas</name>
      <url>http://maven.apache.org</url>
      <properties>
        <maven.compiler.source>7</maven.compiler.source>
        <maven.compiler.target>7</maven.compiler.target>
      </properties>
      <dependencies>
        <dependency>
          <groupId>junit</groupId>
          <artifactId>junit</artifactId>
          <version>3.8.1</version>
          <scope>test</scope>
        </dependency>
      </dependencies>
    
      <repositories>
        <repository>
          <id>gl_pru</id>
          <url><base url for gitlab instance>/api/v4/projects/<project_id>/packages/maven</url>
        </repository>
      </repositories>
      <distributionManagement>
        <repository>
          <id>gl_pru</id>
          <url><base url for gitlab instance>/api/v4/projects/<project_id>/packages/maven</url>
        </repository>
        <snapshotRepository>
          <id>gl_pru</id>
          <url><base url for gitlab instance>/api/v4/projects/<project_id>/packages/maven</url>
        </snapshotRepository>
      </distributionManagement>
    </project>
  4. Add a settings.xml file with:

    content
    <settings>
      <servers>
        <server>
          <id>gl_pru</id>
          <configuration>
            <httpHeaders>
              <property>
                <name>Job-Token</name>
                <value>${env.CI_JOB_TOKEN}</value>
              </property>
            </httpHeaders>
          </configuration>
        </server>
      </servers>
    </settings>
  5. Add a .gitlab-ci.yml with:

    content
    image: alpine:latest
    
    before_script:
      - apk add git curl
    
    build_mvn:
      image: maven:3.6.3-ibmjava-alpine
      stage: build
      script:
        - mkdir tmp
        - cd tmp
        - mvn -B archetype:generate -DarchetypeGroupId=org.apache.maven.archetypes -DgroupId=gl.pru  -DgroupId=gl.pru -DartifactId=mvnbananas -Dversion=2.4.7
        - cd mvnbananas
        - cp ../../pom.xml .
        - mvn deploy -s ../../settings.xml -DskipTests
    
    build_generic:
      stage: build
      script:
        - echo "Bananas" > dummy.txt
        - 'curl --header "JOB-TOKEN: $CI_JOB_TOKEN" --upload-file ./dummy.txt "${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/genericbananas/2.3.7/dummyfile.txt"'
  6. The above will create a pipeline with two jobs. One job for pushing a generic package and one job for pushing a maven package. Start the pipeline and check the server logs for the SQL queries.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by David Fernandez

Merge request reports