Migrate traces to Object Storage(S3)
TODOs
-
All traces have been migrated to S3 -
Investigate the failed traces (i.e. Still stays in FS) during the migration
Current status
We have about 50,000,000 jobs today. Each job has a trace and each trace falls into the following era.
Era | path | counts | artifact record? | Stored in | comments |
---|---|---|---|---|---|
1st | - | 87636 | false | Database(Postgresql) | Issue => https://gitlab.com/gitlab-org/gitlab-ce/issues/34317 |
2nd |
#{builds_path}/#{YYYY_MM}/#{project_ci_id}/#{job_id}.log (e.g. /builds/2018_01/1/12.log ) |
- | false | FileStorage(NFS) | This can be considered as 3rd Era. |
3rd |
#{builds_path}/#{YYYY_MM}/#{project_id}/#{job_id}.log (e.g. /builds/2018_02/19/1172.log ) |
approx. 50,000,000 | false | FileStorage(NFS) | - |
4th(Latest) |
artifacts/#{SHA256['project_id']}/#{YYYY_MM_DD}/#{job_id}/#{job_artifact_id}/trace.log (e.g. artifacts/94/00/94..67/2018_02_13/1374/249/trace.log ) |
increasing since %10.5 | true | FileStorage(NFS) or ObjectStorage(S3) | We can easily move traces between FS <=> OS |
Goals
Migrate all trace files from 2nd and 3rd era to 4th era
The current approach
Proposed by @ayufan https://gitlab.com/gitlab-org/gitlab-ee/issues/4170#note_55903624
We're preparing a script to manually migrate trace files.
The logic of the script
- Choose a trace file
- Detect
job_id
from the file name. This job will be associated with an artifact record. - Clone the trace file in tmp folder
- Create an artifact record with the cloned trace file. If BackgroundUploader is ON, this trace moves to ObjectStorage, otherwise remains in FileStorage.
- Verify checksum between the uploaded file and the original trace file.
- Remove the cloned trace file
- Move the original trace file to a backup folder.
The usage of the script
-
rake gitlab:traces:migrate['2018_02/19/1172.log']
# Migrate a trace file -
rake gitlab:traces:migrate['2018_02']
# Migrate trace files inbuilds/2018_02
folder -
rake gitlab:traces:migrate['.']
# Migrate all trace files
Concerns
- What if traces are missing by some reason? How can we recover it?
- How can we speed up? How can we parallelize the processes?
- When will the migration be done if we stick with the current approach? How long does it take per one approach?
Edited by Shinya Maeda