Handle Deadline Exceeded error in BranchHooksService
Summary
Zendesk: https://gitlab.zendesk.com/agent/tickets/143200 (internal-only)
Customer reported that when pushing a new branch, Web Hooks were not fired. But on subsequent pushes to the branch, the Web Hook fired as expected.
The issue specifically occurred when pushing a new branch to a copy of the Linux Kernel repository. Further, the master
branch on this repository was nearly 2 years out of date, while the new branch included latest upstream changes. This means there were potentially hundred thousands of new commits on the new branch compared to master.
The root cause is a Deadline Exceeded
error because it takes GitLab longer than the allotted Gitaly timeout to gather information about all of the commits contained in the push. It's the huge number of commits between the branch being pushed and the 2-year-old default branch. I see that Linux averaged about 75,000 commits per year, so the discrepancy between the 2-year-old default branch and the branch with latest changes might be 150,000 commits. It is the act of trying to load 150,000 commits that causes GitLab to exceed the 50 second timeout.
This issue is reproducible in a local installation of GitLab. The error we saw in Sidekiq was in the BranchHooksService
:
{"severity":"WARN","time":"2020-01-15T18:00:05.552Z","error_class":"Gitlab::Git::CommandError","error_message":"4:Deadline Exceeded","context":"Job raised exception","jobstr":"{\"class\":\"PostReceive\",\"args\":[\"project-2\",\"user-1\",\"0000000000000000000000000000000000000000 95e20af9fb9ce572129b930967dcb762a318c588 refs/heads/master\\n\",{}],\"retry\":3,\"queue\":\"post_receive\",\"jid\":\"11551992af652a02f7a39c7b\",\"created_at\":1579111175.4152598,\"correlation_id\":\"ec3ffa02-e911-4679-9573-fce4b8f54f6d\",\"enqueued_at\":1579111175.4198825}","class":"PostReceive","args":["project-2","user-1","0000000000000000000000000000000000000000 95e20af9fb9ce572129b930967dcb762a318c588 refs/heads/master\n",{}],"retry":3,"queue":"post_receive","jid":"11551992af652a02f7a39c7b","created_at":1579111175.4152598,"correlation_id":"ec3ffa02-e911-4679-9573-fce4b8f54f6d","enqueued_at":1579111175.4198825,"error_backtrace":["lib/gitlab/git/wraps_gitaly_errors.rb:13:in `rescue in wrapped_gitaly_errors'","lib/gitlab/git/wraps_gitaly_errors.rb:6:in `wrapped_gitaly_errors'","lib/gitlab/git/commit.rb:111:in `between'","app/models/repository.rb:168:in `commits_between'","app/services/git/branch_hooks_service.rb:29:in `block in commits'","lib/gitlab/utils/strong_memoize.rb:30:in `strong_memoize'","app/services/git/branch_hooks_service.rb:20:in `commits'","app/services/git/base_hooks_service.rb:33:in `limited_commits'","app/services/git/branch_hooks_service.rb:86:in `enqueue_process_commit_messages'","app/services/git/branch_hooks_service.rb:77:in `branch_change_hooks'","ee/app/services/ee/git/branch_hooks_service.rb:12:in `branch_change_hooks'","app/services/git/branch_hooks_service.rb:61:in `execute_branch_hooks'","app/services/git/branch_hooks_service.rb:6:in `execute'","app/services/git/branch_push_service.rb:62:in `execute_related_hooks'","app/services/git/branch_push_service.rb:28:in `execute'","ee/app/services/ee/git/branch_push_service.rb:13:in `execute'","app/services/git/process_ref_changes_service.rb:55:in `block in process_changes'","app/services/git/process_ref_changes_service.rb:46:in `each'","app/services/git/process_ref_changes_service.rb:46:in `process_changes'","app/services/git/process_ref_changes_service.rb:20:in `block in process_changes_by_action'","app/services/git/process_ref_changes_service.rb:19:in `each'","app/services/git/process_ref_changes_service.rb:19:in `process_changes_by_action'","app/services/git/process_ref_changes_service.rb:10:in `execute'","app/workers/post_receive.rb:80:in `process_ref_changes'","app/workers/post_receive.rb:52:in `process_project_changes'","app/workers/post_receive.rb:25:in `perform'","lib/gitlab/sidekiq_daemon/monitor.rb:43:in `within_job'"]}
It should be noted that a similar error occurs when pushing the Linux kernel to a new repository (as the default branch). That error is as follows - note that the error also occurs in the BranchHooksService
, but at a different location:
{"severity":"WARN","time":"2020-01-15T17:35:04.141Z","error_class":"Gitlab::Git::CommandError","error_message":"4:Deadline Exceeded","context":"Job raised exception","jobstr":"{\"class\":\"PostReceive\",\"args\":[\"project-2\",\"user-1\",\"0000000000000000000000000000000000000000 95e20af9fb9ce572129b930967dcb762a318c588 refs/heads/master\\n\",{}],\"retry\":3,\"queue\":\"post_receive\",\"jid\":\"9ea9297c2d653395394051cb\",\"created_at\":1579109669.506045,\"correlation_id\":\"c31688c8-9856-4872-af01-5a9a4a85152d\",\"enqueued_at\":1579109669.5074925}","class":"PostReceive","args":["project-2","user-1","0000000000000000000000000000000000000000 95e20af9fb9ce572129b930967dcb762a318c588 refs/heads/master\n",{}],"retry":3,"queue":"post_receive","jid":"9ea9297c2d653395394051cb","created_at":1579109669.506045,"correlation_id":"c31688c8-9856-4872-af01-5a9a4a85152d","enqueued_at":1579109669.5074925,"error_backtrace":["lib/gitlab/git/wraps_gitaly_errors.rb:13:in `rescue in wrapped_gitaly_errors'","lib/gitlab/git/wraps_gitaly_errors.rb:6:in `wrapped_gitaly_errors'","lib/gitlab/git/repository.rb:497:in `commit_count'","app/models/repository.rb:559:in `block in commit_count_for_ref'","lib/gitlab/repository_cache.rb:24:in `fetch'","app/models/repository.rb:559:in `commit_count_for_ref'","app/services/git/branch_hooks_service.rb:140:in `block in count_commits_in_branch'","lib/gitlab/utils/strong_memoize.rb:30:in `strong_memoize'","app/services/git/branch_hooks_service.rb:139:in `count_commits_in_branch'","app/services/git/branch_hooks_service.rb:39:in `commits_count'","app/services/git/base_hooks_service.rb:103:in `push_data_params'","app/services/git/base_hooks_service.rb:112:in `event_push_data'","app/services/git/base_hooks_service.rb:53:in `create_events'","app/services/git/base_hooks_service.rb:12:in `execute'","app/services/git/branch_hooks_service.rb:8:in `execute'","app/services/git/branch_push_service.rb:62:in `execute_related_hooks'","app/services/git/branch_push_service.rb:28:in `execute'","ee/app/services/ee/git/branch_push_service.rb:13:in `execute'","app/services/git/process_ref_changes_service.rb:55:in `block in process_changes'","app/services/git/process_ref_changes_service.rb:46:in `each'","app/services/git/process_ref_changes_service.rb:46:in `process_changes'","app/services/git/process_ref_changes_service.rb:20:in `block in process_changes_by_action'","app/services/git/process_ref_changes_service.rb:19:in `each'","app/services/git/process_ref_changes_service.rb:19:in `process_changes_by_action'","app/services/git/process_ref_changes_service.rb:10:in `execute'","app/workers/post_receive.rb:80:in `process_ref_changes'","app/workers/post_receive.rb:52:in `process_project_changes'","app/workers/post_receive.rb:25:in `perform'","lib/gitlab/sidekiq_daemon/monitor.rb:43:in `within_job'"]}
Steps to reproduce
- Create a fresh repository on your local instance.
- Clone the Linux Kernel on your workstation
- Checkout a 2-year old branch to use as master -
git checkout d8a5b80 -fb old_master
- Push the old master branch -
git push origin old_master
- Observe the second error as shown above.
- Check out a new branch from the actual master, which will have the latest changes -
git checkout master; git checkout -b new_branch
- Push -
git push origin new_branch
- Observe the first error as shown above.
Possible workarounds
This should only happen when pushing the Linux kernel, or a similarly sized repository, to GitLab for the first time or when pushing a highly diverged branch for the first time. Subsequent pushes to GitLab will produce a small changeset and won't invoke the issue.
- Ensure that changes are pushed often so changesets remain small.
- Keep
master
or the default branch updated so the changesets pushed in newer branches remain small.
Possible fixes
In the short term we could rescue the error in two locations to prevent the error. In this case, the web hook simply won't have the info on the changesets.
- To address the issue when pushing new branches: https://gitlab.com/gitlab-org/gitlab/blob/cc6b787e7b0dc8c3d6868fec01d2e1fb6fe68b32/app/services/git/branch_hooks_service.rb#L29
- To address the issue when pushing the first/default branch: https://gitlab.com/gitlab-org/gitlab/blob/cc6b787e7b0dc8c3d6868fec01d2e1fb6fe68b32/app/services/git/branch_hooks_service.rb#L140
It's worth noting that there may be other locations once we address these areas. Basically, we should be rescuing Deadline Exceeded
everywhere we interact with the repository. It might also be worth implementing a development style guide rule or even a rubocop rule to ensure we rescue anywhere we interact with the repository, and handle it gracefully. There are other Sidekiq workers that are susceptible to Deadline Exceeded
errors (ProjectCacheWorker
, for example)
Long term, we may want to consider implementing a limit on the number of commits we process for any given push, regardless of whether it's a new branch, default branch, etc. For example, we currently limit the number of commits we process to 100 for creating the default branch - https://gitlab.com/gitlab-org/gitlab/blob/cc6b787e7b0dc8c3d6868fec01d2e1fb6fe68b32/app/services/git/branch_hooks_service.rb#L23.