Geo: Optimize replication of project repo keep around refs

Summary

For example, commenting on an issue calls Repository#keep_around which writes a keep around ref if it doesn't exist for the current commit. And we trigger a repo update event on every call of Repository#keep_around, which causes secondaries to replicate the repo.

This is valid behavior, but uses a lot of overhead, so there is an opportunity for performance optimization.

Steps to reproduce

  1. In the secondary site, tail geo.log
  2. Comment on an issue
  3. Observe logs like {"severity":"INFO","time":"2022-05-19T00:50:32.648Z","correlation_id":null,"pid":50126,"host":"127.0.0.1","class":"Gitlab::Geo::LogCursor::Daemon","message":"Repository update","project_id":6,"source":"repository","resync_repository":true,"resync_wiki":false,"scheduled_at":"2022-05-18T17:50:32.629-07:00","replicable_project":true,"job_id":"d7d2a28e31b6840ff61a69fe","event_id":2,"cursor_delay_s":0.51}

Proposal

At the very least, instead of enqueuing Geo::CreateRepositoryUpdatedEventWorker for every call of keep_around, we should enqueue it only if keep_around generated a write_ref call.

E.g. if you comment 5 times in a row, and the project repo hasn't been updated during that time, then Geo::CreateRepositoryUpdatedEventWorker should be enqueued once instead of 5 times.

Edited by Michael Kozono