Pull mirror situation overview and solutions

Problem Overview

Pull Mirroring Situation

NOTE: All commands you see using with_import_status(:started) can also be used for with_import_status(:scheduled, :started) to check both scheduled and started projects

Invalid/Duplicate Project imports without jid that:

Cannot be failed

  • Currently we have around 25 Projects without import_jid (that should have it)
    • We can check the started projects without aid using Project.with_import_status(:started).where(import_jid: nil).count
    • At first we thought they were mirrors but as we can see from the first two commands that were ran here, turns out they were not:
    • Those projects as we can see from https://gitlab.com/gitlab-com/infrastructure/issues/2676#note_39984023 are invalid for some reason (specified in the command) and because of that, they cannot change it’s state and are just kept stuck forever.
  • We’ve found through https://gitlab.com/gitlab-com/infrastructure/issues/2676#note_40663154 that those invalid import jobs actually have close dates of update
  • https://gitlab.com/gitlab-com/infrastructure/issues/2676#note_40843093 tells us that we have two problems that we still need to uncover:
    • One of them (duplicate records) was fixed 9 months ago
    • The other is still around since there are projects created in 2017.
  • We’ve found that there exists more projects that are not actually a import job nor a mirror but do have duplicates.
    • Relevant queries can be found here: https://gitlab.com/gitlab-org/gitlab-ee/issues/3490#note_40893211
    • There exists a really weird example where the project name is a thought to represent a date and has no data in them. More information is in the output here: https://gitlab.com/gitlab-com/infrastructure/issues/2676#note_41009798
      • These projects should be looked into more detail through the admin dashboard
      • We should also try to find a way of separating those that need to be removed and warn the respective users about them and figure out what to do with the ones that have data in them.
        • This will probably will need to happen inside a migration with the proper tests written so that we make sure we are not removing important customer data

Cannot be scheduled

In https://gitlab.com/gitlab-org/gitlab-ee/issues/3491#note_41886212 we found that some mirrors might actually not be scheduled at all, since they are invalid projects and therefore not able to change its state. The error we have at hand is for the following project: Project could not be failed #<ActiveModel::Errors:0x007fc74b9cb368 @base=#<Project id:840543 shadowsockssaverorz/XX-Net.wiki>, @messages={:name=>["has already been taken"]}> This project is the “first in line” mirror so we still do not know who is the actual valid “first in line” mirror

CPU usage % for pull mirrors was really low

Some mirrors never leave the Redis scheduled/running mirror set

  • 6 days ago we ran the following redis query to check which mirrors we had inside the Redis scheduled/running mirror set https://gitlab.com/gitlab-org/gitlab-ee/issues/3491#note_41132886
    • We found that some of the mirrors just never leave the Redis scheduled/running mirror set for some reason
    • Based on the status that the mirror 1008371 has in https://gitlab.com/gitlab-org/gitlab-ee/issues/3608#note_42246216 which was ran a week ago, and since that mirror is still present in the Redis scheduled/running mirror set we can assume that either one of the two options is happening:
      • The worker for that mirror does not exist anymore, which could’ve happened when we had sidekiq down or something like that, and therefore will never get to run and will stay like that forever in the Redis scheduled/running mirror set, since we do not check for stuck import jobs with scheduled state.
        • NOTE: This only happens to scheduled mirrors, since we do find these cases for started mirrors
      • That project is invalid and therefore we cannot update it’s status because the validation fails
        • I do think this happens when we try to schedule but not when we try to mark as started, because that would imply that we were able to change the invalid project state, which is not possible

Capacity is not being filled up

  • A few days ago we found that the capacity was not being completely used up.
    • Relevant commands:
      • Gitlab::Redis::SharedState.with { |redis| redis.scard(Gitlab::Mirror::PULL_CAPACITY_KEY) }.to_i
      • Gitlab::Mirror.available_capacity
    • This may have changed so we need to still verify how things are looking now.
    • Further details can be found in this issue description https://gitlab.com/gitlab-org/gitlab-ee/issues/3608#note_42246216

Task list

  • Invalid duplicate project imports without jid that:
    • Cannot be failed
    • Cannot be scheduled
  • CPU usage % for pull mirrors was really low
  • Some mirrors never leave the Redis scheduled/running mirror set
  • Capacity is not being filled up
Edited by Tiago Botelho