Skip to content

Cache database_id before doing other work

Luke Duncalfe requested to merge ld-cache_database_id-asap into master

What does this MR do and why?

This MR changes the GitHub importer to cache the id of an imported issue or MR as soon as it is created.

There are ~2k examples in our production logs that can result in us not caching an ID for a GitHub PR.

The stack trace of those errors show the exception happened here which is the line above where we currently write the id of the imported MR to the cache.

This later leads to data not being imported. For example in NoteImporter, not finding the id here results in us not importing the note. Many objects related to the issue or MR also similarly will fail.

If we write the ID as soon as we have it, it makes these importers more robust, as we can import objects for the affected issues and MRs, like notes.

Screenshots

Following the QA notes below results in (note the note counts on the right side):

master This branch
image image

One of the imported PRs showing on master only notes related to approvals were imported, but on this branch all notes were imported.

master This branch
image image

How to set up and validate locally

  1. Apply the following patch to have an error occur during the import of all issues and PRs.
diff --git a/lib/gitlab/github_import/importer/issue_importer.rb b/lib/gitlab/github_import/importer/issue_importer.rb
index 44e9cd1bd8f1..074e06bd7987 100644
--- a/lib/gitlab/github_import/importer/issue_importer.rb
+++ b/lib/gitlab/github_import/importer/issue_importer.rb
@@ -73,6 +73,8 @@ def create_issue
         #
         # issue_id - The ID of the created issue.
         def create_assignees(issue_id)
+          raise
+
           assignees = []

           issue.assignees.each do |assignee|
diff --git a/lib/gitlab/github_import/importer/pull_request_importer.rb b/lib/gitlab/github_import/importer/pull_request_importer.rb
index acdafef670ca..dd8dc6be88c6 100644
--- a/lib/gitlab/github_import/importer/pull_request_importer.rb
+++ b/lib/gitlab/github_import/importer/pull_request_importer.rb
@@ -72,6 +72,8 @@ def set_merge_request_assignees(merge_request)
         end

         def insert_git_data(merge_request, already_exists)
+          raise
+
           insert_or_replace_git_data(merge_request, pull_request.source_branch_sha, pull_request.target_branch_sha, already_exists)
           # We need to create the branch after the merge request is
           # populated to ensure the merge request is in the right state
  1. Import a small GitHub project to your local GDK. For example, to import https://github.com/ku1ik/rainbow use the below curl.
curl --request POST \
  --url "http://gdk.test:3000/api/v4/import/github" \
  --header "content-type: application/json" \
  --header "PRIVATE-TOKEN: <GITLAB PAT>" \
  --data '{
    "personal_access_token": "<GITHUB PAT>",
    "repo_id": "69648",
    "target_namespace": "<TARGET LOCAL NAMESPACE>",
    "new_name": "ld-cache_database_id-asap-test"
}'
  1. On this branch all comments from the PR and issues will be imported successfully. On master, with a similar patch applied as above, there will only be comments that are related to approvals imported (see #416486 (comment 1588888745) for an explanation of why only these kinds of notes appear).

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Luke Duncalfe

Merge request reports