Skip to content

internal_ids can get out of sync if old code still runs during a deployment

Sentry error today: https://sentry.gitlap.com/gitlab/gitlabcom/issues/161039/

Related to https://gitlab.com/gitlab-com/infrastructure/issues/4019

Today we saw about 39 projects have an incorrect last_value in the internal_ids as introduced by https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/17580.

gitlabhq_production=# select issues.project_id, last_value, max(issues.iid) from internal_ids LEFT OUTER JOIN issues ON internal_ids.project_id = issues.project_id  GROUP BY issues.project_id, last_value having last_value < max(issues.iid);
 project_id | last_value |  max  
------------+------------+-------
      13083 |      45259 | 45260
    1524714 |         67 |    68
    1542728 |         48 |    49
    2027601 |        595 |   596
    2820995 |         11 |    12
    4051821 |        144 |   145
    4133561 |          1 |     2
    4167568 |        722 |   723
    4220443 |        754 |   757
    4406902 |         14 |    16
    4475013 |         50 |    51
    4483215 |        282 |   283
    4538306 |        325 |   326
    4560532 |         95 |    96
    4823337 |         86 |    88
    5025671 |         29 |    31
    5243434 |         69 |    75
    5394944 |         97 |   102
    5485545 |         38 |    39
    5505599 |          9 |    10
    5507292 |        195 |   196
    5545091 |         24 |    25
    5690941 |         22 |    23
    5692668 |          1 |     4
    5698204 |         66 |    67
    5710683 |        179 |   180
    5717884 |         40 |    41
    5855300 |        121 |   122
    5908597 |         94 |    96
    5991003 |         64 |    65
    6006830 |          4 |     5
    6021587 |          9 |    11
    6031149 |          1 |     2
    6033615 |          9 |    13
    6038000 |          4 |     5
    6047400 |         21 |    22
    6047453 |          3 |     5
    6047872 |          1 |     2
    6047907 |          1 |     3
(39 rows)

@eReGeBe and @_stark manually resolved this via a script in https://gitlab.slack.com/archives/C101F3796/p1523488995000086.

@abrandl mentioned:

In retrospect, it makes perfect sense to see these off-by-a-few errors while running both old and new code. What I'd propose to include is a post migration that fixes the problem after all nodes run the new code. AFAIS this has been done (manually) in production already (running Stan's query yields no projects), right? Thanks for catching this and sorry I didn't think of the deploy, will keep that in mind!

Another possibility (albeit perhaps more complex) would be to create a database trigger that updates the table if a new issue is inserted.

Edited by Andreas Brandl