Skip to content

502 when attempting to delete a web hook due to large number of web hook logs

Summary

User kept getting a 502 when attempting to delete a webhook through the admin panel. The webhook had large number of logs and when attempting to delete it the database connection was terminating due to a large number of destroy jobs on the webhook logs.

rb(main):002:0> SystemHook.find(126)
  SystemHook Load (0.6ms)  SELECT  "web_hooks".* FROM "web_hooks" WHERE "web_hooks"."type" IN ('SystemHook') AND "web_hooks"."id" = 126 LIMIT 1  [["id", 126]]
=> #<SystemHook id: 126, url: "http://...", project_id: nil, created_at: "2017-12-21 00:27:24", updated_at: "2018-04-13 03:41:36", type: "SystemHook", service_id: nil, push_events: true, issues_events: false, merge_requests_events: false, tag_push_events: false, group_id: nil, note_events: false, enable_ssl_verification: false, wiki_page_events: false, token: "", pipeline_events: false, confidential_issues_events: false, repository_update_events: false, job_events: false>
irb(main):003:0> SystemHook.find(126).destroy
  SystemHook Load (0.7ms)  SELECT  "web_hooks".* FROM "web_hooks" WHERE "web_hooks"."type" IN ('SystemHook') AND "web_hooks"."id" = 126 LIMIT 1  [["id", 126]]
   (0.3ms)  BEGIN
  WebHookLog Load (275.6ms)  SELECT "web_hook_logs".* FROM "web_hook_logs" WHERE "web_hook_logs"."web_hook_id" = 126  [["web_hook_id", 126]]
  SQL (1.3ms)  DELETE FROM "web_hook_logs" WHERE "web_hook_logs"."id" = 1495988  [["id", 1495988]]
  SQL (0.3ms)  DELETE FROM "web_hook_logs" WHERE "web_hook_logs"."id" = 1537182  [["id", 1537182]]
  SQL (0.3ms)  DELETE FROM "web_hook_logs" WHERE "web_hook_logs"."id" = 1537211  [["id", 1537211]]


[snip about a thousand or so of the same requests ]


  SQL (0.3ms)  DELETE FROM "web_hook_logs" WHERE "web_hook_logs"."id" = 1537215  [["id", 1537215]]
  SQL (0.3ms)  DELETE FROM "web_hook_logs" WHERE "web_hook_logs"."id" = 1537222  [["id", 1537222]]
FATAL:  terminating connection due to idle-in-transaction timeout
ERROR:  server conn crashed?
  SQL (0.4ms)  DELETE FROM "web_hooks" WHERE "web_hooks"."type" IN ('SystemHook') AND "web_hooks"."id" = 126  [["id", 126]]
   (0.1ms)  ROLLBACK
ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: ROLLBACK
    from /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/peek-pg-1.3.0/lib/peek/views/pg.rb:17:in `async_exec'
    from /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/peek-pg-1.3.0/lib/peek/views/pg.rb:17:in `async_exec'

Manual workaround

See #21940 (comment 215143604)

Possible fixes

We currently use a cron job to remove these old webhook logs here https://gitlab.com/gitlab-org/gitlab-ce/blob/master/app/workers/remove_old_web_hook_logs_worker.rb. The default deletion is 2 days, but perhaps it would be better to define a hard limit the amount of logs we store when large number could accumulate during those 2 days.

Update:

app/workers/remove_old_web_hook_logs_worker.rb has been removed and replace by https://gitlab.com/gitlab-org/gitlab/blob/master/app/workers/prune_web_hook_logs_worker.rb -> delete older than 90 days

Further discussion for Long-term (90-day) storage for web_hook_logs -> #23022 (closed)

Edited by Cynthia "Arty" Ng