Canceled jobs have both live trace and archived trace
In the live trace investigation, we found that live traces (trace files in file storage) of canceled jobs are not cleaned up.
Here is the potential scenario. We assume this is caused by a race condition between Gitlab-rails and Gitlab-runner.
- Job is running
- User clicks a cancel button
- ArchiveTraceWorker archives the current live trace
- Gitlab-runner is notified that the job was canceled, and sends full-trace with the line
ERROR: Job failed: canceled
at last - Gitlab-runner creates the live trace agian (unnecessary)
In fact, canceld jobs should have the last line - ERROR: Job failed: canceled
. However we can't see the line on gitlab.com. This means traces are archived before runner sends the full-trace, and create trace files in file storage agian.