Geo - PruneEventLogWorker leaves orphaned entries in the related events table around
Research
During my investigation of repository checksum mismatches on GPRD, I noticed there is about 35514113 geo_repository_updated_events entries leftover, while there are only 11388446 entries in the geo_event_log table:
[ gprd ] production> update_events = Geo::RepositoryUpdatedEvent.count
=> 46902559
[ gprd ] production> events = Geo::EventLog.where.not(repository_updated_event_id: nil).count
=> 11388446
[ gprd ] production> update_events - events
=> 35514113
gitlabhq_production=> \d geo_event_log
Table "public.geo_event_log"
Column | Type | Modifiers
-------------------------------------+-----------------------------+------------------------------------------------------------
id | bigint | not null default nextval('geo_event_log_id_seq'::regclass)
created_at | timestamp without time zone | not null
repository_updated_event_id | bigint |
repository_deleted_event_id | bigint |
repository_renamed_event_id | bigint |
repositories_changed_event_id | bigint |
repository_created_event_id | bigint |
hashed_storage_migrated_event_id | bigint |
lfs_object_deleted_event_id | bigint |
hashed_storage_attachments_event_id | bigint |
job_artifact_deleted_event_id | bigint |
upload_deleted_event_id | bigint |
Indexes:
"geo_event_log_pkey" PRIMARY KEY, btree (id)
"index_geo_event_log_on_repositories_changed_event_id" btree (repositories_changed_event_id)
"index_geo_event_log_on_repository_created_event_id" btree (repository_created_event_id)
"index_geo_event_log_on_repository_deleted_event_id" btree (repository_deleted_event_id)
"index_geo_event_log_on_repository_renamed_event_id" btree (repository_renamed_event_id)
"index_geo_event_log_on_repository_updated_event_id" btree (repository_updated_event_id)
Foreign-key constraints:
"fk_176d3fbb5d" FOREIGN KEY (job_artifact_deleted_event_id) REFERENCES geo_job_artifact_deleted_events(id) ON DELETE CASCADE
"fk_27548c6db3" FOREIGN KEY (hashed_storage_migrated_event_id) REFERENCES geo_hashed_storage_migrated_events(id) ON DELETE CASCADE
"fk_4a99ebfd60" FOREIGN KEY (repositories_changed_event_id) REFERENCES geo_repositories_changed_events(id) ON DELETE CASCADE
"fk_86c84214ec" FOREIGN KEY (repository_renamed_event_id) REFERENCES geo_repository_renamed_events(id) ON DELETE CASCADE
"fk_9b9afb1916" FOREIGN KEY (repository_created_event_id) REFERENCES geo_repository_created_events(id) ON DELETE CASCADE
"fk_c1f241c70d" FOREIGN KEY (upload_deleted_event_id) REFERENCES geo_upload_deleted_events(id) ON DELETE CASCADE
"fk_c4b1c1f66e" FOREIGN KEY (repository_deleted_event_id) REFERENCES geo_repository_deleted_events(id) ON DELETE CASCADE
"fk_d5af95fcd9" FOREIGN KEY (lfs_object_deleted_event_id) REFERENCES geo_lfs_object_deleted_events(id) ON DELETE CASCADE
"fk_rails_78a6492f68" FOREIGN KEY (repository_updated_event_id) REFERENCES geo_repository_updated_events(id) ON DELETE CASCADE
Bug
The problem here is that geo_repository_updated_events is the parent table. If you delete any entry from geo_repository_updated_events, then the corresponding row in geo_events will get pruned, but not vice versa.
Proposed changes
- Modify the
Geo::PruneEventLogWorkerso it deletes rows from the individual event tables - Add post-deploy migration that cleans up the orphaned events in the event tables
Edited by Toon Claes