Direct Transfer: PG::ForeignKeyViolation when copying entities in rapid succession
Background
It's possible to use direct transfer to repeatedly make copies of whole entities, but it's not a common use case for direct transfer. For example, a user may make copies of a group for demos, tutorials or interviews. Because direct transfer wasn't built with this use case in mind, if a user tries to make multiple copies of an entity too quickly, the source instance may destroy all BulkImports::ExportBatch objects for a relation and user before the first copy completes, see #583390.
Problem
While testing !22054 (closed), there were 38 errors with the message (kibana logs) when copying a source project with ~600 issues and ~1200 MRs twice in quick succession:
PG::ForeignKeyViolation: ERROR: insert or update on table "bulk_import_export_uploads" violates foreign key constraint "fk_3cbf0b9a2e"\nDETAIL: Key (batch_id)=(3045811) is not present in table "bulk_import_export_batches".\n"
This happens when relation export batches are deleted just before the upload file can be saved on the export upload object generated by the first migration.
Neither this
Impact
-
Severity: Low - The fix in !220054 (merged) ensures
RelationBatchExportWorkergracefully exits whenbatch_idno longer exists so that the source instance isn't overwhelmed with errors - Scope: Limited - Only 2 users have triggered this on GitLab.com in the last 30 days, both GitLab team members
-
Impact on customers: None, migration results are not impacted by the errors in this issue or #583390 before !220054 (merged) was merged
✅ - Impact on GitLab: Error budgets for groupimport may be impacted, though likely not significantly
Problem
When two Direct Transfer migrations are started for the same source project/group in rapid succession (e.g., 2 seconds apart), the following error can occur:
PG::ForeignKeyViolation: ERROR: insert or update on table "bulk_import_export_uploads" violates foreign key constraint "fk_3cbf0b9a2e"
DETAIL: Key (batch_id)=(3045811) is not present in table "bulk_import_export_batches".
This happens because relation export batches from the first migration are deleted (via export.batches.destroy_all in BatchedRelationExportService) just before the upload file can be saved, causing a foreign key violation when attempting to create the bulk_import_export_uploads record.
Proposed solutions
Option 1: silence the errors
Update RelationBatchExportWorker to catch any errors related to non-existant BulkImports::ExportBatch IDs and exit without raising, optionally logging the error with a lower severity
Option 2:
Add some kind of limit on the number of times a user can request the same source entity in a single bulk import. This limit should not impact users who are performing normal migrations, and it may be implemented alongside option 1.
How to Reproduce
Start two Direct Transfer migrations from the same source project/group within seconds:
curl --request POST \
--url "https://gitlab.com/api/v4/bulk_imports" \
--header "PRIVATE-TOKEN: TOKEN" \
--header "Content-Type: application/json" \
--data '{
"configuration": {
"url": "https://gitlab.com",
"access_token": "TOKEN"
},
"entities": [
{
"source_full_path": "namespace/project",
"source_type": "project_entity",
"destination_slug": "project-copy1",
"destination_namespace": "destination-group"
}
]
}'
sleep 2
# Run same request with different destination_slug