Skip to content

Update old temporary NuGet packages to processing

Steve Abrams requested to merge 324206-processing-package-scope-removal into master

🌳 Context

We recently introduced a status column to the Packages::Package model and table to identify what state a package is in (default, processing, error).

One package type, NuGet, was previously depending on logic around the package name to determine if it was in the processing state. When a user uploads a NuGet package, a package record gets created with the name 'NuGet.Temporary.Package'. Then it eventually is updated to the actual package name after the internal package metadata has been processed. With the new column, when we create the package, we also set the status to processing, and then update it to default after the package has been fully processed.

When filtering packages, we previously would filter out packages where(name: 'NuGet.Temporary.Package'), but now, we can simply filter where(status: :processing).

When the status column was added, it was given a default value of default, which is used for valid, usable packages. This means that NuGet 'NuGet.Temporary.Package' packages that existed prior to that addition would have a default status rather than processing

🔍 What does this MR do?

This MR updates NuGet packages with the temporary name that currently have the default status to use the processing status. This will allow us to follow up and remove the scopes and filters that were very specific to NuGet and just use the status filter instead.

A small side note, you may be wondering why we have so many older packages that are going to be in the processing state. It is likely that these packages are "stuck" in processing because they have failed due to errors. At a future time, we will be looking at if and how we can remove some of these stale packages, but for now, we just want to be sure they have the correct status so they are not accidentally returned when users are trying to search or install packages.

🐘 Database

Up migration output:
== 20210513155447 AddTemporaryPackageIndexForNugetDataMigration: migrating ====
-- transaction_open?()
   -> 0.0001s
-- index_exists?(:packages_packages, :id, {:where=>"package_type = 4 AND name = 'NuGet.Temporary.Package' AND status = 0", :name=>"tmp_index_packages_on_id_where_nuget_default_temp_package", :algorithm=>:concurrently})
   -> 0.0352s
-- execute("SET statement_timeout TO 0")
   -> 0.0053s
-- add_index(:packages_packages, :id, {:where=>"package_type = 4 AND name = 'NuGet.Temporary.Package' AND status = 0", :name=>"tmp_index_packages_on_id_where_nuget_default_temp_package", :algorithm=>:concurrently})
   -> 0.0360s
-- execute("RESET ALL")
   -> 0.0021s
== 20210513155447 AddTemporaryPackageIndexForNugetDataMigration: migrated (0.0829s)

== 20210513155546 BackfillNugetTemporaryPackagesToProcessingStatus: migrating = == 20210513155546 BackfillNugetTemporaryPackagesToProcessingStatus: migrated (0.0918s)

== 20210513155635 RemoveTemporaryPackageIndexForNugetDataMigration: migrating = -- transaction_open?() -> 0.0000s -- indexes(:packages_packages) -> 0.0234s -- remove_index(:packages_packages, {:algorithm=>:concurrently, :name=>"tmp_index_packages_on_id_where_nuget_default_temp_package"}) -> 0.0117s == 20210513155635 RemoveTemporaryPackageIndexForNugetDataMigration: migrated (0.0404s)

Down migration output:
== 20210513155635 RemoveTemporaryPackageIndexForNugetDataMigration: reverting =
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:packages_packages, :id, {:where=>"package_type = 4 AND name = 'NuGet.Temporary.Package' AND status = 0", :name=>"tmp_index_packages_on_id_where_nuget_default_temp_package", :algorithm=>:concurrently})
   -> 0.0112s
== 20210513155635 RemoveTemporaryPackageIndexForNugetDataMigration: reverted (0.0127s)

== 20210513155546 BackfillNugetTemporaryPackagesToProcessingStatus: reverting = == 20210513155546 BackfillNugetTemporaryPackagesToProcessingStatus: reverted (0.0000s)

== 20210513155447 AddTemporaryPackageIndexForNugetDataMigration: reverting ==== -- transaction_open?() -> 0.0000s -- indexes(:packages_packages) -> 0.0102s -- execute("SET statement_timeout TO 0") -> 0.0011s -- remove_index(:packages_packages, {:algorithm=>:concurrently, :name=>"tmp_index_packages_on_id_where_nuget_default_temp_package"}) -> 0.0050s -- execute("RESET ALL") -> 0.0011s == 20210513155447 AddTemporaryPackageIndexForNugetDataMigration: reverted (0.0191s)

Queries

This migration updates all packages_packages records meeting this condition:

package_type = 4 AND name = 'NuGet.Temporary.Package' AND status = 0

There are 5298 records that meet this condition. This number will not change, since new NuGet packages are assigned the processing status when they are created. These are all older historic packages.

gitlabhq_production=> select count(*) from packages_packages where package_type = 4 and name = 'NuGet.Temporary.Package' and status = 0;
 count
-------
  5298
(1 row)

First we create a new index to speed up all queries used in the data migration:

CREATE INDEX CONCURRENTLY index_packages_on_id_where_nuget_default 
ON packages_packages USING btree(id) 
WHERE package_type = 4 
AND status = 0 
AND name = 'NuGet.Temporary.Package'

This took 3.808s on postgres.ai

Now we use EachBatch to iterate over the relation. With a batch of 100, these are the queries it will use:

  1. Find lower bound ID of the batch - Explain Plan - 2.4ms

    SELECT id
      FROM packages_packages 
      WHERE package_type = 4 
      AND name = 'NuGet.Temporary.Package' 
      AND status = 0 
      ORDER BY id ASC 
      LIMIT 1;
  2. Find upper bound ID of the batch - Explain Plan - 1ms

    SELECT "packages_packages"."id" 
      FROM packages_packages 
      WHERE package_type = 4 
      AND name = 'NuGet.Temporary.Package' 
      AND status = 0 
      AND id >= 126508
      ORDER BY "packages_packages"."id" ASC 
      LIMIT 1 OFFSET 100;
  3. Update the batch - Explain Plan - 31.3ms

    UPDATE packages_packages
    SET status = 2
      WHERE package_type = 4 
      AND name = 'NuGet.Temporary.Package' 
      AND status = 0 
      AND id >= 126508
      AND id < 288784;

Estimated total migration time:

5298 records / 100 per batch = 53 batches
53 batches * (2.4ms + 1ms + 31.3ms) = 1874ms => ~2s

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Related to #324206 (closed)

Edited by Steve Abrams

Merge request reports