Skip to content

Add rake task for migration legacy pages storage

Vladimir Shushlin requested to merge pages-migration-task into master

What does this MR do?

Related to #282466 (closed)

This rake task just uses service introduced in !49473 (merged) (this MR currently targets this branch).

We plan to run this task on .com soon.

It probably will go slow and we will want to optimize it, e.g. migrate some projects in parallel. But we want to try running if first and see how it goes.

This task can be stopped/started at any time: it will just continue to handle projects which aren't migrated yet.

Database queries

Thanks to @krasio:

The execution plans doesn't look too bad - we do not have the best possible index but given we are working in small batches (1_000) timings are OK. We may need to add a dedicated index (WHERE deployed = true AND pages_deployment_id IS NULL) when we want to switch to background migration (which we'll have to do eventually for self-managed).

1. Batch 1 - start id

SELECT "project_pages_metadata"."project_id" FROM "project_pages_metadata" WHERE "project_pages_metadata"."deployed" = TRUE AND "project_pages_metadata"."pages_deployment_id" IS NULL ORDER BY "project_pages_metadata"."project_id" ASC LIMIT 1

https://postgres.ai/console/shared/e673c237-7a0a-486f-9912-1939e7c2b26e

gitlabhq_production=> SELECT "project_pages_metadata"."project_id" FROM "project_pages_metadata" WHERE "project_pages_metadata"."deployed" = TRUE AND "project_pages_metadata"."pages_deployment_id" IS NULL ORDER BY "project_pages_metadata"."project_id" ASC LIMIT 1;
 project_id 
------------
        179
(1 row)

Time: 4.564 ms

2. Batch 1 - end id

SELECT "project_pages_metadata"."project_id" FROM "project_pages_metadata" WHERE "project_pages_metadata"."deployed" = TRUE AND "project_pages_metadata"."pages_deployment_id" IS NULL AND "project_pages_metadata"."project_id" >= 179 ORDER BY "project_pages_metadata"."project_id" ASC LIMIT 1 OFFSET 1000

https://postgres.ai/console/shared/abd55983-015c-43cd-8780-58ddc3436bd7

gitlabhq_production=> SELECT "project_pages_metadata"."project_id" FROM "project_pages_metadata" WHERE "project_pages_metadata"."deployed" = TRUE AND "project_pages_metadata"."pages_deployment_id" IS NULL AND "project_pages_metadata"."project_id" >= 179 ORDER BY "project_pages_metadata"."project_id" ASC LIMIT 1 OFFSET 1000;
 project_id 
------------
    1015390
(1 row)

Time: 262.336 ms

3. Batch 1 - SELECT * for batch

SELECT "project_pages_metadata".* FROM "project_pages_metadata" WHERE "project_pages_metadata"."deployed" = TRUE AND "project_pages_metadata"."pages_deployment_id" IS NULL AND "project_pages_metadata"."project_id" >= 179 AND "project_pages_metadata"."project_id" <= 1015390

https://postgres.ai/console/shared/1e893edc-c913-4402-80ae-fd30de93f17e

4. Batch 2 - end id

SELECT "project_pages_metadata"."project_id" FROM "project_pages_metadata" WHERE "project_pages_metadata"."deployed" = TRUE AND "project_pages_metadata"."pages_deployment_id" IS NULL AND "project_pages_metadata"."project_id" >= 1015390 ORDER BY "project_pages_metadata"."project_id" ASC LIMIT 1 OFFSET 1000

https://postgres.ai/console/shared/9bbff61d-0c51-44c5-b8e8-b6eeba76c238

gitlabhq_production=> SELECT "project_pages_metadata"."project_id" FROM "project_pages_metadata" WHERE "project_pages_metadata"."deployed" = TRUE AND "project_pages_metadata"."pages_deployment_id" IS NULL AND "project_pages_metadata"."project_id" >= 1015390 ORDER BY "project_pages_metadata"."project_id" ASC LIMIT 1 OFFSET 1000;
 project_id 
------------
    1153397
(1 row)

Time: 43.223 ms

5. Batch 2 - SELECT * for batch

SELECT "project_pages_metadata".* FROM "project_pages_metadata" WHERE "project_pages_metadata"."deployed" = TRUE AND "project_pages_metadata"."pages_deployment_id" IS NULL AND "project_pages_metadata"."project_id" >= 1015390 AND "project_pages_metadata"."project_id" <= 1153397

https://postgres.ai/console/shared/ce631e47-6721-4b15-8554-eff84411d972

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Kamil Trzciński

Merge request reports