ObjectCounter key expiration prevents long running GitHub imports from completing
About
Each object we fetch, and successfully import during a GitHub import is counted in Redis using Gitlab::GithubImport::ObjectCounter
. In &9665 (closed) we saved this data to PostgreSQL at the end of an import in order to display to the user.
Problem
The data is stored using our default Redis cache expiry timeout of 1 day.
In long-running imports that span multiple days, old counts are dropped.
This results in us not only losing the data before we try to save it to the database but due to our validation of the checksums
property, the transition of the import state from started to completed fails because of the presence of null
values in the counts #416306 (comment 1513145874). This results in the project import state being stuck forever in "importing".
Implementation Guide
- Set cache timeouts
Gitlab::GithubImport::ObjectCounter
of2.weeks
to accommodate much longer imports. This will mean we capture the data successfully for longer imports. - Change
Gitlab::GithubImport::ObjectCounter.summary
method to return0
values instead of anull
values when the data to count is missing from Redis. This will allow the import state to transition from started to finished successfully even when data is missing from Redis.
Edited by Luke Duncalfe