Skip to content

Add API to bulk delete project artifacts

Albert requested to merge 223793-api-bulk-delete-project-artifacts into master

What does this MR do and why?

Adds a new API to delete project build artifacts in bulk.

DELETE /projects/:id/artifacts

Project maintainers can call this API to remove all existing erasable job artifacts from non-locked pipelines.

Some keywords here are:

  • erasable job artifacts - all job artifacts except trace
  • non-locked pipelines - pipelines that are not the latest pipeline of each ref

This MR also adds a background migration to add index on ci_job_artifacts on id, project_id and file_type.

Screenshots or screen recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

Dependencies

DONE. This MR depends on !77393 (merged). Before merging, we need to verify that !77393 (merged) is deployed to production and the index has been created asynchronously. See https://docs.gitlab.com/ee/development/adding_database_indexes.html#verify-the-mr-was-deployed-and-the-index-exists-in-production for more detail.

How to set up and validate locally

  1. Start with FF :bulk_expire_project_artifacts disabled
  2. Create a new project
  3. Run a main branch pipeline that creates an artifact (pipeline1)
  4. Add a new commit on main branch, run a new pipeline on main that creates an artifact (pipeline2)
  5. Verify that pipeline1 artifacts are unknown? and expire_at is in the future based on the default (30 days)
  6. Verify that pipeline2 artifacts are unknown? and expire_at is in the future based on the default (30 days)
  7. Add a tag on main branch, creating a pipeline for the tag with artifacts (pipeline_tagged)
  8. Verify that pipeline_tagged artifacts are unknown? and expire_at is in the future based on the default (30 days)
  9. Call API DELETE /projects/:id/artifacts, verify 404 response
  10. Enable FF :bulk_expire_project_artifacts
  11. Call API DELETE /projects/:id/artifacts, verify 202 response, leaving some time for the worker to complete.
  12. Verify that pipeline1 artifacts that are not trace are unlocked? and expire_at is the current time
  13. Verify that pipeline1 artifacts that are trace remains unchanged.
  14. Verify that pipeline2 and pipeline_tagged artifacts are unchanged
  15. Run Ci::ExpireBuildArtifactsWorker
  16. Wait for the next run of Ci::DeleteObjectsWorker
  17. Verify that pipeline1 artifacts that have been expired are removed

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Feature flags

Database queries

Migration:

CREATE INDEX index_ci_job_artifacts_on_id_project_id_and_file_type ON ci_job_artifacts USING btree (project_id, file_type, id);

SELECT query:

SELECT "id"
FROM (WITH RECURSIVE "array_cte"
              AS MATERIALIZED (SELECT "file_type" FROM (VALUES (1), (2), (4), (5), (6), (7), (8), (9), (101), (11), (12), (13), (14), (15), (16), (17), (18), (19), (20), (21), (22), (23), (24), (25), (26), (27)) AS file_types (file_type)),
     "recursive_keyset_cte" AS ((SELECT NULL::bigint AS id, array_cte_file_type_array, ci_job_artifacts_id_array, 0::bigint AS count FROM (SELECT ARRAY_AGG("array_cte"."file_type") AS array_cte_file_type_array, ARRAY_AGG("ci_job_artifacts"."id") AS ci_job_artifacts_id_array FROM (SELECT "array_cte"."file_type" FROM array_cte) array_cte LEFT JOIN LATERAL (SELECT "ci_job_artifacts"."id" FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."project_id" = 26 AND "ci_job_artifacts"."file_type" = "array_cte"."file_type" ORDER BY "ci_job_artifacts"."id" ASC LIMIT 1) ci_job_artifacts ON TRUE WHERE "ci_job_artifacts"."id" IS NOT NULL) array_scope_lateral_query LIMIT 1)
UNION ALL
(
SELECT recursive_keyset_cte.ci_job_artifacts_id_array[position], array_cte_file_type_array, recursive_keyset_cte.ci_job_artifacts_id_array[:position_query.position-1]||next_cursor_values.id||recursive_keyset_cte.ci_job_artifacts_id_array[position_query.position+1:], recursive_keyset_cte.count + 1
FROM recursive_keyset_cte,
    LATERAL (SELECT id, position FROM UNNEST(ci_job_artifacts_id_array) WITH ORDINALITY AS u(id, position) WHERE id IS NOT NULL ORDER BY 1 ASC LIMIT 1) AS position_query,
    LATERAL (SELECT "record"."id" FROM (VALUES (NULL)) AS nulls LEFT JOIN (SELECT "ci_job_artifacts"."id" FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."project_id" = 278964 AND "ci_job_artifacts"."file_type" = recursive_keyset_cte.array_cte_file_type_array[position] AND ("ci_job_artifacts"."id" > recursive_keyset_cte.ci_job_artifacts_id_array[position]) ORDER BY "ci_job_artifacts"."id" ASC LIMIT 1) record ON TRUE LIMIT 1) AS next_cursor_values
    ))
SELECT id
FROM "recursive_keyset_cte" AS "ci_job_artifacts"
WHERE (count <> 0)) ci_job_artifacts LIMIT 1000

Query plan: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/7797/commands/27802

Update query:

UPDATE "ci_job_artifacts"
SET "locked"    = 0,
    "expire_at" = '2021 - 12 - 28 10:48:09.406304'
WHERE "ci_job_artifacts"."id" IN (SELECT "ci_job_artifacts"."id" FROM "ci_job_artifacts" INNER JOIN "ci_builds" ON "ci_builds"."id" = "ci_job_artifacts"."job_id"
  AND "ci_builds"."type" = 'Ci::Build' INNER JOIN "ci_pipelines" ON "ci_pipelines"."id" = "ci_builds"."commit_id" WHERE "ci_pipelines"."locked" = 0 AND "ci_job_artifacts"."id" IN (13263, 13264, 13265, 13266, 13267, 13268, 13269, 13270, 13271, 13272, 13368, 13369, 13370, 13371, 13372, 13373, 13374, 13375, 13376, 13377))

Query plan: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/7797/commands/27803

Related to #223793 (closed)

Edited by Albert

Merge request reports