Delete inactive groups and projects access tokens after 30 days

What does this MR do and why?

Related to #471683 (closed)

There are two classes of records stored in personal_access_tokens table

  • groups and projects access tokens, aka resource access tokens - tokens that tied to user records with user_type 6/project_bot.
  • personal access tokens - tokens that tied to user records with user_type 0/human.

Inactive token is a token that is expired or revoked.

After personal access tokens become inactive(expired or revoked) they retain in the personal_access_tokens. GitLab does not automatically delete those records, hence we collected millions of inactive tokens in the table, see #471683 (comment 2062242381). This MR is not supposed to implement automatic deletion of personal access tokens, however this information might be helpful to validate the implementation of this MR from the database perspective.

Currently, right after groups and projects access tokens become inactive, they are automatically deleted. #462217 (closed) will disable[1, 2] the existing automatic deletion of groups and projects access tokens because

  • we need to retain inactive tokens for 30 days for compliance and monitoring purposes
  • the existing approach of automatic deletion of inactive tokens has gaps, it relies on membership expiration that could lead to bypass automatic deletion of inactive tokens and even to early token deletion, see #482845 (closed). Also, token revocation via rails console bypasses automatic token deletion. Because of that there is some amount of inactive groups and projects access tokens in the production DB, see #471683 (comment 2062242381).

This MR implements ResourceAccessTokens::InactiveTokensDeletionCronWorker cron worker that is run daily at 00:00. That worker iterates over User.project_bot users and schedules users deletion(**) if its token is inactive for more than 30 days. Moving automatic deletion of inactive groups and projects access tokens to separate process will also prevent bypass deletion of inactive tokens.

** - note that, as per existing implementation, to delete group and project access token we should delete user tied to the token. However, when deleting personal access token only the token record should be deleted.

DB

Query Plans

NOTE: The query plans use the production DB's id ranges.

each_batch finds the lowest id for the query

old: SELECT "users"."id" FROM "users" WHERE "users"."user_type" = 6 ORDER BY "users"."id" ASC LIMIT 1;

new: SELECT "users"."id" FROM "users" WHERE "users"."user_type" = 6 AND ("users"."id" > 0) ORDER BY "users"."id" ASC LIMIT 1;

old: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31265/commands/97069

new: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31291/commands/97151

each_batch finds the next id for the query based on the batch size configuration - 1000

old: SELECT "users"."id" FROM "users" WHERE "users"."user_type" = 6 AND "users"."id" >= 7416080 ORDER BY "users"."id" ASC LIMIT 1 OFFSET 1000;

new: SELECT "users"."id" FROM "users" WHERE "users"."user_type" = 6 AND ("users"."id" > 0) AND "users"."id" >= 7416080 ORDER BY "users"."id" ASC LIMIT 1 OFFSET 1000;

old: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31265/commands/97071

new: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31291/commands/97153

each_batch query with the id range for the batch

old: SELECT "users"."id", "users"."username" FROM "users" LEFT OUTER JOIN "personal_access_tokens" ON "personal_access_tokens"."user_id" = "users"."id" WHERE "users"."user_type" = 6 AND "users"."id" >= 7416080 AND "users"."id" < 7710470 AND "users"."id" IN (SELECT "personal_access_tokens"."user_id" FROM "personal_access_tokens" WHERE ("personal_access_tokens"."expires_at" <= '2024-09-03' AND "personal_access_tokens"."expires_at" < '2024-08-04' OR "personal_access_tokens"."revoked" = TRUE AND "personal_access_tokens"."updated_at" < '2024-08-04 08:19:50'));

new: SELECT "users"."id", "users"."username" FROM "users" LEFT OUTER JOIN "personal_access_tokens" ON "personal_access_tokens"."user_id" = "users"."id" WHERE "users"."user_type" = 6 AND ("users"."id" > 0) AND "users"."id" >= 7416080 AND "users"."id" < 7710470 AND "users"."id" IN (SELECT "personal_access_tokens"."user_id" FROM "personal_access_tokens" WHERE ("personal_access_tokens"."expires_at" <= '2024-09-06' AND "personal_access_tokens"."expires_at" < '2024-08-07' OR "personal_access_tokens"."revoked" = TRUE AND "personal_access_tokens"."updated_at" < '2024-08-07 11:40:14'));

old: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31179/commands/96844

new: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31291/commands/97155

each_batch query with the id range for the last batch

old: SELECT "users"."id", "users"."username" FROM "users" LEFT OUTER JOIN "personal_access_tokens" ON "personal_access_tokens"."user_id" = "users"."id" WHERE "users"."user_type" = 6 AND "users"."id" >= 22700839 AND "users"."id" IN (SELECT "personal_access_tokens"."user_id" FROM "personal_access_tokens" WHERE ("personal_access_tokens"."expires_at" <= '2024-09-03' AND "personal_access_tokens"."expires_at" < '2024-08-04' OR "personal_access_tokens"."revoked" = TRUE AND "personal_access_tokens"."updated_at" < '2024-08-04 08:19:50'));

new: SELECT "users"."id", "users"."username" FROM "users" LEFT OUTER JOIN "personal_access_tokens" ON "personal_access_tokens"."user_id" = "users"."id" WHERE "users"."user_type" = 6 AND ("users"."id" > 0) AND "users"."id" >= 22754813 AND "users"."id" IN (SELECT "personal_access_tokens"."user_id" FROM "personal_access_tokens" WHERE ("personal_access_tokens"."expires_at" <= '2024-09-06' AND "personal_access_tokens"."expires_at" < '2024-08-07' OR "personal_access_tokens"."revoked" = TRUE AND "personal_access_tokens"."updated_at" < '2024-08-07 11:40:14'));

old: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31179/commands/96846

new: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31291/commands/97158

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Edited by Bogdan Denkovych

Merge request reports

Loading