Delete existing p_ci_build_trace_metadata records of archived traces and free up disk space

Context

We use p_ci_build_trace_metadata to keep track of the archival attempts on the job logs and verify that the signature of the files are correct. But we don't have to keep the data around after the archival is successful.

In #500654 (closed), we implemented a process to delete the metadata records of newly archived traces. Now in this issue we have to delete existing/old archived trace metadata records.

Proposal

After #500654 (closed) is released, investigate and implement a process to delete existing metadata records of traces that were successfully archived. The unused disk space of deleted records should be reclaimed by the OS.

We consider the trace "successfully archived" if the metadata record satisfies these two conditions:

archived_at is populated, AND
remote_checksum IS NULL OR checksum = remote_checksum

This is the same logic we applied to the process introduced in #500654 (closed). See def successfully_archived? for reference.

Investigation

The table currently has ~1 billion rows (as of 2025-04-07). While a BBM can work, the problem is that it won't actually free up the disk space and give it back to the OS. Ref Slack:

@krasio: We (#g_database_frameworks) think this will be fine with the current protections we have for background migrations. The concern is rather id this the right thing to do - even if we delete these records we won’t gain back the disk space. Should we rather focus on 1) make it possible to start using new partition, 2) archive any records that are not yet archived, 3) drop old partitions.

This is due to the nature of Postgres' plain vacuum:

Plain VACUUM (without FULL) simply reclaims space and makes it available for re-use. [..] However, extra space is not returned to the operating system (in most cases); it's just kept available for re-use within the same table. [...] VACUUM FULL rewrites the entire contents of the table into a new disk file with no extra space, allowing unused space to be returned to the operating system. This form is much slower and requires an ACCESS EXCLUSIVE lock on each table while it is being processed.

Per above, while VACUUM FULL would achieve what we want, we are unable to fully lock the table without downtime.

Alternative approaches

Archive all records on the latest partition, create a new partition, and then drop the old partitions.

Problems: Currently, we cannot bump the partition in p_ci_build_trace_metadata (as of 2025-04-07). Ref Slack:

@morefice We had many broken FKs on ci_pipelines due to an internal postgres bug which made it impossible to bump the partition in the last few months

We potentially could TRUNCATE the entire latest partition. Ref Slack:

@lma-git: Maybe this is what Marius was already getting at here, but based on the trace archival logic, it might be okay just to truncate all of p_ci_build_trace_metadata after all. Now that we've reduced the growth rate, we can start with a clean slate.

It seems that deleting all the existing trace metadata records would effectively just reset the archival attempts to 0 for all failed trace archives... and this doesn't seem so bad. If a job with an unarchived trace still has stale live trace data, we'd just be giving it another 5 attempts. Granted, it might be millions of jobs that this poor ArchiveTracesCronWorker has to reprocess and it might take forever (but we can increase the batch size and/or clean out real old build trace chunks so it doesn't pick them up 😅).

Moreover, I don't see what we actually do with records that have invalid remote_checksum values. It looks like we just log it and leave it for posterity. But I think we still read the invalid archived file anyway?

I'll dig into the codebase further to confirm, and we can bring this discussion to the issue. But what are your initial thoughts here? 🙏

This option is further investigated/discussed in the comment thread below: #533933 (comment 2438367011)

Update [2025-04-10]

We are planning to simply truncate the entire 102 partition. Per #533933 (comment 2442276364), a new partition 103 will be created. As such, the traffic on 102 will be much smaller and we can truncate it without too much concern for in-flight jobs.
Further performance/data handling improvements will be considered in a follow up issue: #534999.

Edited Apr 10, 2025 by Leaminn Ma