Skip to content

Import more resources using events in GitHub Import

Rodrigo Tomonari requested to merge rodrigo/extended-events into master

What does this MR do and why?

This change updates GitHub Import to import merged by, review requests, reviews, and comments using GitHub's events API in the IssueEvents stage, making the ImportPullRequestsMergedByWorker, Stage::ImportPullRequestsReviewRequestsWorker, Stage::ImportPullRequestsReviewsWorker, and ImportNotesWorker obsolete.

In order to re-use the code, the IssueEventImporter class was updated to call importer classes from the obsolete stages.

All these changes are behind github_import_extended_events feature flag

Related to: #433536 (closed)

Notes

Feature flag

When the feature flag is turned on, the import setting called "extended_events" is also enabled. This setting is used to decide which stages should be executed during the import process. The purpose of this approach is to ensure that enabling or disabling the feature flag doesn't affect the ongoing migrations.

Reviewers

Importing reviewers using GitHub's timeline events isn't straightforward compared to using GitHub's pull request API

Different from the PullRequest API, timeline events do not provide a list of current reviewers for a pull request. Instead, it returns a sequence of events for when a reviewer was added or removed. So, to import the reviewers, during the import process adding and removing reviewers while reading the events would result in the correct list of reviewers to be set. The problem is that GitHub Import enqueue one worker for each event to be imported; therefore, events aren't imported in order.

To address this issue, we maintain a list of all the review_requested and review_request_removed events associated with a pull request. Subsequently, a separate process compiles these events and identifies the pull request reviewers.

Import options

With this change, the import setting Import issue and pull request events is redundant and is removed from the UI.

Import stats

GitHub Import stats for merged_by, notes, pull_request_review_request and pull_request_review will no longer exist as the resources will be included in the issue_events stats.

Screenshots or screen recordings

In the UI, one import option is removed when the feature flag is enabled

Before After
Screenshot_2023-12-18_at_19.08.51 Screenshot_2023-12-18_at_19.07.31

How to set up and validate locally

  1. Enable github_import_extended_events feature flag
  2. Use the script below to create users from a GitHub repository and cache them on Redis. This way, most of the users should be mapped when using a public repository
Script to create users

Use the script below to create GitHub users in your local environment and cache them on Redis

access_token = 'GITHUB_ACCESS_TOKEN'
repo = 'rspec/rspec-core' # E.g rspec/rspec-core

@processed = Set.new

def read_issues(issue)
  user = issue.to_hash[:user]

  return if @processed.include?(user[:login])

  gitlab_user = User.find_by_username(user[:login])

  unless gitlab_user
    gitlab_user = User.create(
      name: user[:login],
      username: user[:login],
      email: "#{user[:login]}@github.com",
      password: '5iveL!fe',
      state: 'deactivated',
      confirmed_at: Time.now
    )
  end

  # Return if user failed to be created
  return unless gitlab_user

  key = Gitlab::GithubImport::UserFinder::ID_CACHE_KEY % user[:id]
  Gitlab::Cache::Import::Caching.write(key, gitlab_user.id)

  key = Gitlab::GithubImport::UserFinder::EMAIL_FOR_USERNAME_CACHE_KEY % user[:login]
  Gitlab::Cache::Import::Caching.write(key, gitlab_user.email)

  @processed.include?(user[:login])
end

client = Octokit::Client.new(access_token: access_token)
issues = client.issues(repo, state: 'all', per_page: 100)
issues.each do |issue|
  read_issues(issue)
end

next_url = client.last_response.rels[:next]

while next_url
  puts next_url.href
  response = next_url.get
  issues = response.data
  issues.each do |issue|
    read_issues(issue)
  end
  next_url = response.rels[:next]
end
  1. Use command below to trigger a migration
curl --location 'http://gdk.test:3000/api/v4/import/github' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer GDK_ACCESS_TOKEN' \
--data '{
    "personal_access_token": "GITHUB_ACCESS_TOKEN",
    "repo_id": "238972",
    "target_namespace": "root",
    "new_name": "rspec-core",
    "optional_stages": {
      "attachments_import": false,
      "collaborators_import": false
    }
}'
  1. Check if everything was migrated as before

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Rodrigo Tomonari

Merge request reports