Pull mirror situation overview and solutions
Problem Overview
Pull Mirroring Situation
NOTE: All commands you see using with_import_status(:started)
can also be used for with_import_status(:scheduled, :started)
to check both scheduled and started projects
Invalid/Duplicate Project imports without jid that:
Cannot be failed
- Currently we have around 25 Projects without import_jid (that should have it)
- We can check the
started
projects without aid usingProject.with_import_status(:started).where(import_jid: nil).count
- At first we thought they were mirrors but as we can see from the first two commands that were ran here, turns out they were not:
-
https://gitlab.com/gitlab-com/infrastructure/issues/2676#note_39072500
Project.mirror.with_import_status(:started).where(import_jid: nil).each { |proj| puts "Project #{proj} - #{proj.import_status}” }
Project.with_import_status(:started).where(import_jid: nil, mirror: false).each { |proj| puts "Project #{proj} - #{proj.import_status}” }
-
https://gitlab.com/gitlab-com/infrastructure/issues/2676#note_39072500
- Those projects as we can see from https://gitlab.com/gitlab-com/infrastructure/issues/2676#note_39984023 are invalid for some reason (specified in the command) and because of that, they cannot change it’s state and are just kept stuck forever.
- Command used:
Project.with_import_status(:started).where(import_jid: nil).each { |project| if project.import_fail ; put "Project failed" ; else puts "Project could not be failed #{project.errors.inspect}" end }
- That makes it so that
StuckImportJobsWorker
cannot mark them as failed.- However, it always goes through them without changing any of them
- Command used:
- We can check the
- We’ve found through https://gitlab.com/gitlab-com/infrastructure/issues/2676#note_40663154 that those invalid import jobs actually have close dates of update
-
https://gitlab.com/gitlab-com/infrastructure/issues/2676#note_40843093 tells us that we have two problems that we still need to uncover:
- One of them (duplicate records) was fixed 9 months ago
- The other is still around since there are projects created in 2017.
- We’ve found that there exists more projects that are not actually a import job nor a mirror but do have duplicates.
- Relevant queries can be found here: https://gitlab.com/gitlab-org/gitlab-ee/issues/3490#note_40893211
- There exists a really weird example where the project name is a thought to represent a date and has no data in them. More information is in the output here: https://gitlab.com/gitlab-com/infrastructure/issues/2676#note_41009798
- These projects should be looked into more detail through the admin dashboard
- We should also try to find a way of separating those that need to be removed and warn the respective users about them and figure out what to do with the ones that have data in them.
- This will probably will need to happen inside a migration with the proper tests written so that we make sure we are not removing important customer data
Cannot be scheduled
In https://gitlab.com/gitlab-org/gitlab-ee/issues/3491#note_41886212 we found that some mirrors might actually not be scheduled at all, since they are invalid projects and therefore not able to change its state.
The error we have at hand is for the following project:
Project could not be failed #<ActiveModel::Errors:0x007fc74b9cb368 @base=#<Project id:840543 shadowsockssaverorz/XX-Net.wiki>, @messages={:name=>["has already been taken"]}>
This project is the “first in line” mirror so we still do not know who is the actual valid “first in line” mirror
CPU usage % for pull mirrors was really low
- Relevant graph: https://performance.gitlab.net/dashboard/db/sidekiq-stats?refresh=5m&orgId=1&panelId=35&fullscreen&from=now-30d&to=now
- Over the time we’ve noticed that the CPU usage % for the pull mirrors was really low (< 20%) and that no mirrors were being processed.
- Adding more nodes and increasing the capacity showed positive results and we now can have mirrors being scheduled in an hourly basis, but not lower than that.
- This increase in capacity and concurrency was based on the following comment: https://gitlab.com/gitlab-org/gitlab-ee/issues/3491#note_41079868
- I used the following mirrors as an experiment, the 5th one is invalid on purpose:
Some mirrors never leave the Redis scheduled/running mirror set
- 6 days ago we ran the following redis query to check which mirrors we had inside the Redis scheduled/running mirror set https://gitlab.com/gitlab-org/gitlab-ee/issues/3491#note_41132886
- We found that some of the mirrors just never leave the Redis scheduled/running mirror set for some reason
- This was backed up when we ran the same query and we had the same exact mirror ids there plus some new ones (https://gitlab.com/gitlab-org/gitlab-ee/issues/3608#note_42246216)
- Based on the status that the mirror
1008371
has in https://gitlab.com/gitlab-org/gitlab-ee/issues/3608#note_42246216 which was ran a week ago, and since that mirror is still present in the Redis scheduled/running mirror set we can assume that either one of the two options is happening:- The worker for that mirror does not exist anymore, which could’ve happened when we had sidekiq down or something like that, and therefore will never get to run and will stay like that forever in the Redis scheduled/running mirror set, since we do not check for stuck import jobs with scheduled state.
- NOTE: This only happens to scheduled mirrors, since we do find these cases for started mirrors
- That project is invalid and therefore we cannot update it’s status because the validation fails
- I do think this happens when we try to schedule but not when we try to mark as started, because that would imply that we were able to change the invalid project state, which is not possible
- The worker for that mirror does not exist anymore, which could’ve happened when we had sidekiq down or something like that, and therefore will never get to run and will stay like that forever in the Redis scheduled/running mirror set, since we do not check for stuck import jobs with scheduled state.
- We found that some of the mirrors just never leave the Redis scheduled/running mirror set for some reason
Capacity is not being filled up
- A few days ago we found that the capacity was not being completely used up.
- Relevant commands:
Gitlab::Redis::SharedState.with { |redis| redis.scard(Gitlab::Mirror::PULL_CAPACITY_KEY) }.to_i
Gitlab::Mirror.available_capacity
- This may have changed so we need to still verify how things are looking now.
- Further details can be found in this issue description https://gitlab.com/gitlab-org/gitlab-ee/issues/3608#note_42246216
- Relevant commands:
Task list
-
Invalid duplicate project imports without jid that: -
Cannot be failed -
Cannot be scheduled
-
-
CPU usage % for pull mirrors was really low -
Some mirrors never leave the Redis scheduled/running mirror set -
Capacity is not being filled up