PruneWebHookLogsWorker cannot keep up with the rate of inserts
PruneWebHookLogsWorker
attempts to delete Web hook logs older than 90 days, and it runs every hour, deleting at most 50,000 rows (https://gitlab.com/gitlab-org/gitlab/blob/f78af0fc6865fc3e58d2bbf3b4f6b911f45e3674/app/workers/prune_web_hook_logs_worker.rb#L15-18) per hour or 1.2 million rows per day. However, that is not sufficient to keep up with the rate of inserts per day (2.6 million today):
gitlabhq_production=# SELECT COUNT(*) FROM "web_hook_logs" WHERE "web_hook_logs"."created_at" >= '2020-09-25 01:40:28.791184';
-[ RECORD 1 ]--
count | 2589523
I question whether we really need to keep this much data and in the database.
Questions:
- Should we prune more often? We could remove a few more batches probably without too much concern. For customers-gitlab-com#875 (closed), I removed 29 million rows in about 6 hours using batches of 1000.
- Should we store a finite set of logs (e.g. 10 per project) instead of by date?
- Should we store this data in Redis Cluster or somewhere else?
- Do we really need this feature?
Related: #21940 (closed)
Edited by Stan Hu