Data retention for Geo deleted events
The primary prunes the Geo event log on a regular basis (every 2h). But there was a bug that resulted in only the deletion of the geo_event_log
rows, not the associated rows in geo_#{event_type}_events
.
gitlab-org/gitlab-ee!6175 will fix that issue. But during investigation of incorrect sync numbers (in gitlab-com/migration#295 (closed)), it has been proven very useful to still have the deleted events, even after they are handled by all the nodes.
Proposal
Looking at the current numbers:
At the moment there were in total about 500k deleted events generated, and more than 60M updated events.
So maybe we can make the pruning less aggressive:
- prune
geo_repository_updated_events
like we have been doing. These are the majority of all events (over 95%), but not very helpful for troubleshooting - keep the
geo_repository_deleted_events
for x months after they're created and handled by all secondaries -
repository_renamed_events
also might be useful -
repository_created_events
aren't critical, because the ProjectSyncWorker will pick up new projects any way - others?
cc @stanhu @nick.thomas @ash.mckenzie