WIP: Use de-duplication to reduce memory and amount of SQL queries for import (!18026) · Merge requests · GitLab.org / GitLab

Kamil Trzciński (Back 2025-01-01) requested to merge import-dedup-data into import-improve-project-restore Oct 02, 2019

What does this MR do?

This uses de-duplication to:

reduce amount of memory needed to hold hash, as there's a ton of duplication,
re-uses already created relations instead of creating a new ones

This is based on: !18005 (merged) !18003 (merged) !18007 (merged) !18024 (merged)

Problems

We need to be careful when de-duplication can be used, as it can introduce hard to debug problems.

Lets consider the following example:

  "merge_requests": [
    {
      "id": 27,
      "target_branch": "feature",
      "source_branch": "feature_conflict",
      "source_project_id": 999,
      "author_id": 1,
      "merge_params": {
        "force_remove_source_branch": null
      },
      ...
      "resource_label_events": [
        {
          "id":243,
          "action":"add",
          "issue_id":null,
          "merge_request_id":27,
          "label_id":null,
          "user_id":1,
          "created_at":"2018-08-28T08:24:00.494Z"
        }
      ],

There are a problems with:

merge_params, which might point to the same hash,
resource_label_events (not here exactly, as there's unique id).

The merge_params case needs to be considered automatically, so de-duplication needs to understand whether the hierarchy it defines is linked top-level.

Ideally, it means that we should de-duplicate only objects on top-level, understanding that objects on lower levels could be re-used only if matching entry is found on top-level.

It means that we should consider creating de-dups only for relations that are:

labels => label,
milestones => milestone,
likely others as well

It reduces the efficiency, but should reduce the chance of going side-ways.

Does this MR meet the acceptance criteria?

Conformity

Edited May 31, 2022 by 🤖 GitLab Bot 🤖

WIP: Use de-duplication to reduce memory and amount of SQL queries for import

What does this MR do?

Problems

Does this MR meet the acceptance criteria?

Conformity

Merge request reports