Cleanup broken ci_build_trace_chunks table

Discovered in gitlab-org/gitlab#330141 (closed), we have a large backlog of broken/lost CI build trace chunks in the database (1.2M rows, 550K builds), many of which are pointing to Redis data that has long since expired.

We need to clean these up in an ad-hoc fasion, to then see if that will allow the Ci::ArchiveTracesCronWorker worker to keep up with future cleanups.

Some conversations copied from gitlab-org/gitlab#330141 (closed) for further development.

Proposed cleanup (WIP)

Ci::BuildTraceChunk.in_batches(of: 100) do |traces_batch|
  Ci::Build.without_archived_trace.id_in(traces_batch.select(:build_id)).includes(:project, :job_artifacts_trace).each do |build|
    next unless build.finished_at
    next if build.finished_at > 12.hours.ago

    puts "Archiving #{build.id}: #{build.finished_at}"
    Ci::ArchiveTraceService.new.execute(build, worker_name: 'ManualChunkCleanup')
  end
end

Open questions

  1. without_archived_trace looks like it will miss some possibly relevant builds (https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/13342#note_571954993)
  2. To successfully archive we need to either:
    1. Backfill Redis with some (placeholder) data for missing chunks (see gitlab-org/gitlab#330141 (comment 571027032))
    2. Delete all the ci trace chunks records older than 7 days first, because their redis data is missing (TTL) and won't be returning.
Edited by Craig Miskell