Skip to content

Fix update lock to happen on CTE's query rather than outside query

Alexandru Croitor requested to merge fix_locking_on_backfilled_epic_rows into master

What does this MR do and why?

Context

This MR addresses a bug with how we approached locking records that are being backfilled. The bug was detected and raised in !153115 (comment 1932297238)

Essentially we were using a CTE query with a lock that looked smth like this:

WITH "batched_relation" AS MATERIALIZED
(
  SELECT "epics".* FROM "epics"
  WHERE "epics"."id" BETWEEN 21 AND 30
  AND "epics"."id" >= 21 AND "epics"."id" < 23
)
SELECT "epics".* FROM "batched_relation"
AS "epics"
FOR UPDATE

Turns out this does not actually lock the epic rows between ids 21 and 23 and you can easily update them in a different transaction. Steps to reproduce can be found in the link above.

New queries

This MR changes the query to look more like the one bellow or removes the CTE altogether if not needed.

  1. CTE is kept in BackfillEpicBasicFieldsToWorkItemRecord due the planner not picking the right index in some scenarios, see !143844 (comment 1801312347)
  2. CTE is not needed in BackfillWorkItemHierarchyForEpics so we replace it with direct batched query

Do we need to backport

Short answer is NO. We do not need to backport given that we merge this fix in %17.1

WHY ?

  1. BackfillEpicBasicFieldsToWorkItemRecord was merged in 17.0, so we would have needed to backport for this specific migration. Howevver we already know for sure we need to re-enqueue this migration, because at the time when this was merged the code was not 100% ready to keep the legacy epic and epic work item records in sync. So the original run of the migration boils down to only create a epic work item record for every epic and we'll enqueue the migration one more time to sync up attributes.

  2. BackfillWorkItemHierarchyForEpics was merged in 17.1 which at this time is not released yet, so we can fix the bug before 17.1 release. We did run the migration on .com already but after quick check on a replica everything looks fine:

    Checks
      gitlabhq_dblab=# select count(*) from epics WHERE epics.parent_id IS NULL;
       count
      --------
       368321
      (1 row)
      
      gitlabhq_dblab=# select count(*) from epics WHERE epics.parent_id IS NOT NULL;
       count
      -------
       85125
      (1 row)
      
      gitlabhq_dblab=# select count(*) from epics LEFT JOIN work_item_parent_links pl ON (pl.work_item_id = epics.issue_id) WHERE pl.work_item_parent_id IS NULL;
       count
      --------
       368321
      (1 row)
      
      gitlabhq_dblab=# select count(*) from epics LEFT JOIN work_item_parent_links pl ON (pl.work_item_id = epics.issue_id) WHERE pl.work_item_parent_id IS NOT NULL;
       count
      -------
       85125
      (1 row)

re #465227

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Edited by Alexandru Croitor

Merge request reports