Skip to content

Add script to recalculate project statistics build artifacts size

Erick Bajao requested to merge eb-recalculate-artifacts-size-script into master

What does this MR do and why?

Describe in detail what your merge request does and why.

Resolves #238536 (closed)

This is the rake task that we are going to run to recalculate the build artifacts size of those projects affected.

The plan here is to accept the project IDs returned from the rake task of !80580 (merged).

It will be executed like:

$ rake gitlab:project_with_incorrect_artifact_size | rake gitlab:update_project_statistics_build_artifacts_size

Database review requirements

DB migration output

== 20220301175426 CreateProjectBuildArtifactsSizeRefresh: migrating ===========
-- create_table(:project_build_artifacts_size_refreshes, {})
   -> 0.0403s
== 20220301175426 CreateProjectBuildArtifactsSizeRefresh: migrated (0.0403s) ==

Query for fetching artifacts by batch

SELECT "ci_job_artifacts"."size" FROM "ci_job_artifacts"
WHERE "ci_job_artifacts"."project_id" = 278964
AND "ci_job_artifacts"."created_at" <= '2021-01-01 00:00:00'
AND "ci_job_artifacts"."id" > 165
ORDER BY created_at ASC LIMIT 1000
Query plan

Realized on a dedicated thin clone as creating an index on ci_job_artifacts takes a lot of time.

gitlabhq_dblab=# explain analyze SELECT "ci_job_artifacts"."id", "ci_job_artifacts"."size" FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."project_id" = 278964 AND (created_at <= '2022-03-10 12:58:18.601374' AND id > 165) ORDER BY "ci_job_artifacts"."created_at" ASC LIMIT 1000;
                                                                                           QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.70..628.91 rows=1000 width=24) (actual time=0.032..1.584 rows=1000 loops=1)
   ->  Index Scan using index_ci_job_artifacts_on_id_project_id_and_created_at on ci_job_artifacts  (cost=0.70..54239796.66 rows=86340333 width=24) (actual time=0.029..1.446 rows=1000 loops=1)
         Index Cond: ((project_id = 278964) AND (created_at <= '2022-03-10 12:58:18.601374+00'::timestamp with time zone) AND (id > 165))
 Planning Time: 6.714 ms
 Execution Time: 1.753 ms
(5 rows)

This is currently slow. The index for this query will be added by !82585 (merged).

Query for finding remaining refresh work to process

SELECT "project_build_artifacts_size_refreshes".* FROM "project_build_artifacts_size_refreshes"
WHERE ("project_build_artifacts_size_refreshes"."state" = 1 OR "project_build_artifacts_size_refreshes"."state" = 2
AND (updated_at < '2022-03-08 18:05:25.635464'))

QUERY PLAN: https://explain.depesz.com/s/nufw

Screenshots or screen recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

  1. Enable the low concurrency feature flag, this sets the max running jobs to just 2.
Feature.enable(:projects_build_artifacts_size_refresh_low)
  1. Forcefully update the project statistics build artifacts size of a certain project with job artifacts to a wrong value
project.statistics.update(build_artifacts_size: 1)
  1. Take note of the total job artifacts size of the project, we will use this to validate the build_artifacts_size value later on
project.job_artifacts.sum(&:size)
  1. Now on the bash terminal, run the rake task that will enqueue the refresh work for the project. Assuming the ID of the project is 5
$ echo '5' | rake gitlab:refresh_project_statistics_build_artifacts_size
  1. On the sidekiq admin UI Cron tab, force enqueue the projects_schedule_refresh_build_artifacts_size_statistics_worker so you won't have to wait. This would then enqueue workers to recalculate the total artifacts size by batch
  2. On the rails console, you can regularly check if there are remaining jobs to process in the refresh table
Projects::BuildArtifactsSizeRefresh.remaining
  1. Once complete, you can then check the final value of the build artifacts size statistic.
project.statistics.reload.build_artifacts_size == project.job_artifacts.sum(&:size)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Max Orefice

Merge request reports