Migrate traces to Object Storage(S3)

We have about 50,000,000 jobs today. Each job has a trace and each trace falls into the following era.

Era	path	counts	artifact record?	Stored in	comments
1st	-	87636	false	Database(Postgresql)	Issue => https://gitlab.com/gitlab-org/gitlab-ce/issues/34317
2nd	`#{builds_path}/#{YYYY_MM}/#{project_ci_id}/#{job_id}.log` (e.g. `/builds/2018_01/1/12.log`)	-	false	FileStorage(NFS)	This can be considered as 3rd Era.
3rd	`#{builds_path}/#{YYYY_MM}/#{project_id}/#{job_id}.log` (e.g. `/builds/2018_02/19/1172.log`)	approx. 50,000,000	false	FileStorage(NFS)	-
4th(Latest)	`artifacts/#{SHA256['project_id']}/#{YYYY_MM_DD}/#{job_id}/#{job_artifact_id}/trace.log` (e.g. `artifacts/94/00/94..67/2018_02_13/1374/249/trace.log`)	increasing since %10.5	true	FileStorage(NFS) or ObjectStorage(S3)	We can easily move traces between FS <=> OS

Migrate all trace files from 2nd and 3rd era to 4th era

We're preparing a script to manually migrate trace files.

The logic of the script

Choose a trace file
Detect job_id from the file name. This job will be associated with an artifact record.
Clone the trace file in tmp folder
Create an artifact record with the cloned trace file. If BackgroundUploader is ON, this trace moves to ObjectStorage, otherwise remains in FileStorage.
Verify checksum between the uploaded file and the original trace file.
Remove the cloned trace file
Move the original trace file to a backup folder.

The usage of the script

rake gitlab:traces:migrate['2018_02/19/1172.log'] # Migrate a trace file
rake gitlab:traces:migrate['2018_02'] # Migrate trace files in builds/2018_02 folder
rake gitlab:traces:migrate['.'] # Migrate all trace files

What if traces are missing by some reason? How can we recover it?
How can we speed up? How can we parallelize the processes?
When will the migration be done if we stick with the current approach? How long does it take per one approach?

Edited Apr 03, 2018 by Shinya Maeda