Failed Batched Background Migration - MigrateEvidencesForVulnerabilityFindings
Context:
This issue aims to understand why the batched background migration MigrateEvidencesForVulnerabilityFindings
failed.
Investigation:
This batched background migration has a few jobs with a large number of attempts. Currently, the max number of attempts is 3.
gitlabhq_dblab=# SELECT id, attempts, status FROM batched_background_migration_jobs where attempts > 3 AND batched_background_migration_id=390 ;
id | attempts | status
--------+----------+--------
284596 | 28 | 1
284462 | 33 | 1
284643 | 13 | 2
284574 | 5 | 2
284532 | 4 | 2
284679 | 21 | 3
284748 | 5 | 3
284659 | 9 | 3
284711 | 4 | 3
284626 | 8 | 3
284584 | 9 | 3
284617 | 4 | 3
(12 rows)
state :pending, value: 0
state :running, value: 1
state :failed, value: 2
state :succeeded, value: 3
I have checked the transition logs for job IDs 284596 and 284462. The batched job never fails.
Examples:
ID: 284596
SELECT id attempts status FROM batched_background_migration_jobs where AND batched_background_migration_id=390;
id | batched_background_migration_job_id | created_at | updated_at | previous_status | next_status | exception_class | exception_message
--------+-------------------------------------+-------------------------------+-------------------------------+-----------------+-------------+-----------------+-------------------
508483 | 284596 | 2023-04-10 16:56:02.609359+00 | 2023-04-10 16:56:02.609359+00 | 0 | 1 | |
513387 | 284596 | 2023-04-12 00:02:02.883453+00 | 2023-04-12 00:02:02.883453+00 | 1 | 1 | |
513445 | 284596 | 2023-04-12 01:03:01.667648+00 | 2023-04-12 01:03:01.667648+00 | 1 | 1 | |
513541 | 284596 | 2023-04-12 08:02:03.026283+00 | 2023-04-12 08:02:03.026283+00 | 1 | 1 | |
513679 | 284596 | 2023-04-12 09:13:06.097244+00 | 2023-04-12 09:13:06.097244+00 | 1 | 1 | |
513776 | 284596 | 2023-04-12 10:16:02.824863+00 | 2023-04-12 10:16:02.824863+00 | 1 | 1 | |
513875 | 284596 | 2023-04-12 11:20:09.920843+00 | 2023-04-12 11:20:09.920843+00 | 1 | 1 | |
513944 | 284596 | 2023-04-12 12:22:05.606471+00 | 2023-04-12 12:22:05.606471+00 | 1 | 1 | |
514046 | 284596 | 2023-04-12 13:33:02.512034+00 | 2023-04-12 13:33:02.512034+00 | 1 | 1 | |
514164 | 284596 | 2023-04-12 14:35:06.724385+00 | 2023-04-12 14:35:06.724385+00 | 1 | 1 | |
514268 | 284596 | 2023-04-12 15:39:07.690306+00 | 2023-04-12 15:39:07.690306+00 | 1 | 1 | |
514335 | 284596 | 2023-04-12 16:45:06.449972+00 | 2023-04-12 16:45:06.449972+00 | 1 | 1 | |
514429 | 284596 | 2023-04-12 17:48:02.777641+00 | 2023-04-12 17:48:02.777641+00 | 1 | 1 | |
514534 | 284596 | 2023-04-12 18:53:05.010544+00 | 2023-04-12 18:53:05.010544+00 | 1 | 1 | |
514636 | 284596 | 2023-04-12 19:54:02.753632+00 | 2023-04-12 19:54:02.753632+00 | 1 | 1 | |
514763 | 284596 | 2023-04-12 20:55:02.239532+00 | 2023-04-12 20:55:02.239532+00 | 1 | 1 | |
514890 | 284596 | 2023-04-12 21:57:02.494536+00 | 2023-04-12 21:57:02.494536+00 | 1 | 1 | |
514997 | 284596 | 2023-04-12 22:57:02.718215+00 | 2023-04-12 22:57:02.718215+00 | 1 | 1 | |
515127 | 284596 | 2023-04-13 00:01:14.29505+00 | 2023-04-13 00:01:14.29505+00 | 1 | 1 | |
515249 | 284596 | 2023-04-13 01:02:03.896382+00 | 2023-04-13 01:02:03.896382+00 | 1 | 1 | |
515363 | 284596 | 2023-04-13 02:04:03.791864+00 | 2023-04-13 02:04:03.791864+00 | 1 | 1 | |
515467 | 284596 | 2023-04-13 03:04:07.421926+00 | 2023-04-13 03:04:07.421926+00 | 1 | 1 | |
515605 | 284596 | 2023-04-13 04:05:03.131302+00 | 2023-04-13 04:05:03.131302+00 | 1 | 1 | |
515734 | 284596 | 2023-04-13 05:05:05.038241+00 | 2023-04-13 05:05:05.038241+00 | 1 | 1 | |
515849 | 284596 | 2023-04-13 06:06:01.588875+00 | 2023-04-13 06:06:01.588875+00 | 1 | 1 | |
515973 | 284596 | 2023-04-13 07:06:02.064761+00 | 2023-04-13 07:06:02.064761+00 | 1 | 1 | |
516116 | 284596 | 2023-04-13 08:09:03.674393+00 | 2023-04-13 08:09:03.674393+00 | 1 | 1 | |
516255 | 284596 | 2023-04-13 09:10:02.491714+00 | 2023-04-13 09:10:02.491714+00 | 1 | 1 | |
(28 rows)
ID: 284462
SELECT * from batched_background_migration_job_transition_logs where batched_background_migration_job_id=284462 order by created_at;
id | batched_background_migration_job_id | created_at | updated_at | previous_status | next_status | exception_class | exception_message
--------+-------------------------------------+-------------------------------+-------------------------------+-----------------+-------------+-----------------+-------------------
508221 | 284462 | 2023-04-10 15:07:02.158892+00 | 2023-04-10 15:07:02.158892+00 | 0 | 1 | |
513320 | 284462 | 2023-04-11 23:24:02.702903+00 | 2023-04-11 23:24:02.702903+00 | 1 | 1 | |
513402 | 284462 | 2023-04-12 00:32:08.116494+00 | 2023-04-12 00:32:08.116494+00 | 1 | 1 | |
513477 | 284462 | 2023-04-12 01:42:48.822871+00 | 2023-04-12 01:42:48.822871+00 | 1 | 1 | |
513478 | 284462 | 2023-04-12 02:42:50.359603+00 | 2023-04-12 02:42:50.359603+00 | 1 | 1 | |
513491 | 284462 | 2023-04-12 07:38:08.17633+00 | 2023-04-12 07:38:08.17633+00 | 1 | 1 | |
513611 | 284462 | 2023-04-12 08:40:06.654589+00 | 2023-04-12 08:40:06.654589+00 | 1 | 1 | |
513714 | 284462 | 2023-04-12 09:43:07.880233+00 | 2023-04-12 09:43:07.880233+00 | 1 | 1 | |
513817 | 284462 | 2023-04-12 10:45:09.374195+00 | 2023-04-12 10:45:09.374195+00 | 1 | 1 | |
513911 | 284462 | 2023-04-12 11:47:03.541031+00 | 2023-04-12 11:47:03.541031+00 | 1 | 1 | |
513986 | 284462 | 2023-04-12 12:48:10.769015+00 | 2023-04-12 12:48:10.769015+00 | 1 | 1 | |
514089 | 284462 | 2023-04-12 13:52:02.295703+00 | 2023-04-12 13:52:02.295703+00 | 1 | 1 | |
514224 | 284462 | 2023-04-12 15:03:05.097805+00 | 2023-04-12 15:03:05.097805+00 | 1 | 1 | |
514310 | 284462 | 2023-04-12 16:06:02.62205+00 | 2023-04-12 16:06:02.62205+00 | 1 | 1 | |
514373 | 284462 | 2023-04-12 17:09:04.97477+00 | 2023-04-12 17:09:04.97477+00 | 1 | 1 | |
514473 | 284462 | 2023-04-12 18:21:02.479408+00 | 2023-04-12 18:21:02.479408+00 | 1 | 1 | |
514566 | 284462 | 2023-04-12 19:25:09.16248+00 | 2023-04-12 19:25:09.16248+00 | 1 | 1 | |
514712 | 284462 | 2023-04-12 20:26:09.094921+00 | 2023-04-12 20:26:09.094921+00 | 1 | 1 | |
514827 | 284462 | 2023-04-12 21:27:04.852359+00 | 2023-04-12 21:27:04.852359+00 | 1 | 1 | |
514947 | 284462 | 2023-04-12 22:28:02.466307+00 | 2023-04-12 22:28:02.466307+00 | 1 | 1 | |
515056 | 284462 | 2023-04-12 23:28:02.939686+00 | 2023-04-12 23:28:02.939686+00 | 1 | 1 | |
515192 | 284462 | 2023-04-13 00:35:02.450755+00 | 2023-04-13 00:35:02.450755+00 | 1 | 1 | |
515311 | 284462 | 2023-04-13 01:37:02.227499+00 | 2023-04-13 01:37:02.227499+00 | 1 | 1 | |
515410 | 284462 | 2023-04-13 02:38:01.571919+00 | 2023-04-13 02:38:01.571919+00 | 1 | 1 | |
515535 | 284462 | 2023-04-13 03:38:02.095293+00 | 2023-04-13 03:38:02.095293+00 | 1 | 1 | |
515681 | 284462 | 2023-04-13 04:39:02.645927+00 | 2023-04-13 04:39:02.645927+00 | 1 | 1 | |
515800 | 284462 | 2023-04-13 05:39:05.082966+00 | 2023-04-13 05:39:05.082966+00 | 1 | 1 | |
515917 | 284462 | 2023-04-13 06:40:02.052948+00 | 2023-04-13 06:40:02.052948+00 | 1 | 1 | |
516048 | 284462 | 2023-04-13 07:41:05.252714+00 | 2023-04-13 07:41:05.252714+00 | 1 | 1 | |
516183 | 284462 | 2023-04-13 08:42:06.274571+00 | 2023-04-13 08:42:06.274571+00 | 1 | 1 | |
516324 | 284462 | 2023-04-13 09:42:10.901202+00 | 2023-04-13 09:42:10.901202+00 | 1 | 1 | |
516473 | 284462 | 2023-04-13 10:42:59.039044+00 | 2023-04-13 10:42:59.039044+00 | 1 | 1 | |
516615 | 284462 | 2023-04-13 11:42:59.718777+00 | 2023-04-13 11:42:59.718777+00 | 1 | 1 | |
(33 rows)
These two jobs have a considerably large batch size.
SELECT id, batch_size FROM batched_background_migration_jobs where id=284462;
id | batch_size
--------+------------
284462 | 130975
SELECT id, batch_size FROM batched_background_migration_jobs where id=284596;
id | batch_size
--------+------------
284596 | 138810
We also have failed jobs with the following error message:
Job ID: 284643
Exception: ActiveRecord::StatementInvalid
Message:
(scroll right)
PG::UntranslatableCharacter: ERROR: unsupported Unicode escape sequence +
| | | | | | | LINE 1: ...929990', '2023-04-12 01:10:52.929990'), (7898889, '{"request... +
| | | | | | | ^ +
| | | | | | | DETAIL: \u0000 cannot be converted to text. +
| | | | | | | CONTEXT: JSON data, line 1: ...reason_phrase":"OK","status_code":200},"summary":...+
Job ID: 284574
Exception: ActiveRecord::StatementInvalid
Message:
(scroll right)
PG::UntranslatableCharacter: ERROR: unsupported Unicode escape sequence +
| | | | | | | LINE 1: ...651746', '2023-04-12 08:55:19.651746'), (6317631, '{"request... +
| | | | | | | ^ +
| | | | | | | DETAIL: \u0000 cannot be converted to text. +
| | | | | | | CONTEXT: JSON data, line 1: ...reason_phrase":"OK","status_code":200},"summary":...+
Conclusion:
I recommend we enqueue the batched background migration again with a lower batch size and max batch size defined. We also need to fix the error posted above.