Cleanup broken ci_build_trace_chunks table
Discovered in gitlab-org/gitlab#330141 (closed), we have a large backlog of broken/lost CI build trace chunks in the database (1.2M rows, 550K builds), many of which are pointing to Redis data that has long since expired.
We need to clean these up in an ad-hoc fasion, to then see if that will allow the Ci::ArchiveTracesCronWorker worker to keep up with future cleanups.
Some conversations copied from gitlab-org/gitlab#330141 (closed) for further development.
Proposed cleanup (WIP)
Ci::BuildTraceChunk.in_batches(of: 100) do |traces_batch|
Ci::Build.without_archived_trace.id_in(traces_batch.select(:build_id)).includes(:project, :job_artifacts_trace).each do |build|
next unless build.finished_at
next if build.finished_at > 12.hours.ago
puts "Archiving #{build.id}: #{build.finished_at}"
Ci::ArchiveTraceService.new.execute(build, worker_name: 'ManualChunkCleanup')
end
end
Open questions
-
without_archived_tracelooks like it will miss some possibly relevant builds (https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/13342#note_571954993) - To successfully archive we need to either:
- Backfill Redis with some (placeholder) data for missing chunks (see gitlab-org/gitlab#330141 (comment 571027032))
- Delete all the ci trace chunks records older than 7 days first, because their redis data is missing (TTL) and won't be returning.
Edited by Craig Miskell