Skip to content

Import stuck in import_in_progress due to failure committing transaction

Problem

While testing in &7528 (comment 885468323), we retried the import of a repository that had previously failed to be pre-imported because it had no tags folder on the old bucket prefix. I then realized the migration status got stuck at import_in_progress, but looking at the logs we can see that committing the transaction failed for some reason.

Status before retry:

gitlabhq_registry=> select * from repositories where path = 'jdrpereira/27441/foo';
-[ RECORD 1 ]----------+-----------------------------------------------------------------------------------------------------
id                     | 311837
top_level_namespace_id | 108527
parent_id              |
created_at             | 2022-03-30 15:46:09.307931+00
updated_at             | 2022-03-30 15:46:09.907941+00
name                   | foo
path                   | jdrpereira/27441/foo
migration_status       | pre_import_failed
deleted_at             |
migration_error        | 1 error occurred:                                                                                   +
                       |         * pre importing tagged manifests: reading tags: unknown repository name=jdrpereira/27441/foo+
                       |                                                                                                     +
                       |

Now:

gitlabhq_registry=> select * from repositories where path = 'jdrpereira/27441/foo';
-[ RECORD 1 ]----------+-----------------------------------------------------------------------------------------------------
id                     | 311837
top_level_namespace_id | 108527
parent_id              |
created_at             | 2022-03-30 15:46:09.307931+00
updated_at             | 2022-04-01 15:17:28.903823+00
name                   | foo
path                   | jdrpereira/27441/foo
migration_status       | import_in_progress
deleted_at             |
migration_error        | 1 error occurred:                                                                                   +
                       |         * pre importing tagged manifests: reading tags: unknown repository name=jdrpereira/27441/foo+
                       |                                                                                                     +
                       |

Note that the migration_error was not cleanup up, but that's due to #636 (closed). Ignore that detail here.

Then looking at the logs:

image

image

image

Expanding the error message we see:

2 errors occurred:
	* commit repository transaction: sql: transaction has already been committed or rolled back
	* updating migration status after failed final import: updating repository: context deadline exceeded

Note that for this test we have configured the registry to sleep for 10 minutes after the (pre)import. I believe that what happened is that the transaction deadline (10 minutes if I recall) was exceeded and then we failed to update the repository status.

This is one of the worst things that can happen (repository stuck in the "importing" status) as that means that writes are rejected.

Solution

Regardless if this is a consequence of using a fake sleep for testing purposes, we should never leave the repository status in an inaccurate state. We need to find the root cause and fix it.

Additionally, we should also add a catch for panics during final imports so that we can catch any and try to update the status one last time.