Skip to content

500 error when loading older in-process Merge Requests due to slow ListConflictsFiles response

Summary

When users attempt to view in-process MRs submitted several months ago the page loads a minute, eventually returning a 500.

This appears to be triggered by notify_conflict?, which calls into the Gitaly ListConflictFiles service. That service is exceeding 55 seconds, causing a GRPC::Deadline Exceeded error. The customer was able to work around the problem by temporarily bumping their Gitaly timeout from 55 to 120 seconds.

This is occurring on a moderately sized project using the default merge commit setting. The impacted MRs are roughly 200 commits behind HEAD with a diff of ~1,000,000 lines over 200+ files.

Customer's Gitaly performance is good in general, with the exception of ListConflictFiles. strace shows that the offending process appears to be bottlenecked by time spent in userland rather than file or I/O access.

  pid         actv (ms)     wait (ms)     user (ms)    total (ms)    syscalls     children
  -------    ----------    ----------    ----------    ----------    ---------    ---------
  1380          885.948      4942.070     58367.770     64195.789          227            0
Expand for Gitaly Method stats

METHOD COUNT PERC99 PERC95 STDDEV MAX MIN SCORE
SSHUploadPack 127 305883.41 304052.07 99202.63 331052.3 5.54 38847193.1
ListConflictFiles 19 59379.5 57959.05 25788.72 59734.61 1.97 1128210.5
ListCommitsByOid 7706 11.96 9.71 2.44 18.2 2.01 92163.8
CommitIsAncestor 7635 7.75 7.04 2.13 13.85 1.59 59171.3
FindCommit 9729 4.53 3.79 0.67 18.17 1.41 44072.4
FindCommits 65 364.95 349.31 100.99 366.63 2.45 23721.8
ListLastCommitsForTree 22 483.34 439.83 161.52 494.91 10.23 10633.5
CommitLanguages 1 7153.31 7153.31 0.0 7153.31 7153.31 7153.3
TreeEntry 424 12.5 5.27 1.75 20.47 1.96 5300.0
SSHReceivePack 1 3875.71 3875.71 0.0 3875.71 3875.71 3875.7
GetTreeEntries 176 16.02 9.25 4.04 44.27 2.3 2819.5
CountCommits 20 115.48 15.13 30.49 140.57 2.34 2309.6
PostUploadPack 11 104.09 101.57 9.16 104.73 73.34 1145.0
UserMergeBranch 1 1012.77 1012.77 0.0 1012.77 1012.77 1012.8
GetCommitSignatures 89 9.62 5.21 1.49 9.77 1.62 856.2
LastCommitForPath 35 23.89 19.74 5.56 25.95 3.85 836.2
GetBlobs 29 25.98 14.61 5.73 30.36 3.09 753.4
InfoRefsUploadPack 14 42.63 41.96 4.41 42.79 24.37 596.8
ListTagNamesContainingCommit 4 149.16 147.29 10.93 149.62 123.78 596.6
IsRebaseInProgress 3094 0.17 0.14 0.04 0.89 0.05 526.0
GetNewLFSPointers 2 234.8 234.66 2.61 234.84 231.15 469.6
FindDefaultBranchName 42 5.39 5.3 1.12 5.41 1.59 226.4
FindLocalBranches 1 187.57 187.57 0.0 187.57 187.57 187.6
HasLocalBranches 7 22.09 21.96 8.19 22.12 2.12 154.6
FetchSourceBranch 3 44.5 44.39 6.55 44.53 32.52 133.5
CommitsBetween 7 17.07 16.38 5.29 17.25 4.05 119.5
CommitDelta 25 4.59 3.63 0.74 4.89 1.71 114.8
CommitDiff 9 12.76 11.32 3.13 13.12 2.84 114.8
DiffStats 10 11.44 9.33 2.59 11.97 3.14 114.4
GetLFSPointers 27 2.82 2.42 0.31 2.96 1.62 76.1
FindBranch 11 5.56 5.55 0.45 5.56 4.19 61.2
FindMergeBase 8 7.42 7.24 2.07 7.47 2.2 59.4
RefExists 17 2.93 2.37 0.42 3.07 1.29 49.8
FindAllBranchNames 8 4.48 4.32 0.87 4.52 2.06 35.8
GetBlob 7 3.92 3.76 0.43 3.96 2.7 27.4
FilterShasWithSignatures 2 10.3 10.07 4.12 10.36 4.53 20.6
Cleanup 2 10.21 10.11 1.68 10.23 7.85 20.4
FindAllTagNames 7 2.39 2.34 0.29 2.4 1.52 16.7
RepositoryExists 46 0.3 0.26 0.08 0.33 0.04 13.8
GetInfoAttributes 16 0.77 0.51 0.18 0.83 0.11 12.3
RepositorySize 1 10.7 10.7 0.0 10.7 10.7 10.7
WriteRef 2 2.35 2.34 0.11 2.35 2.2 4.7
ApplyGitattributes 1 3.16 3.16 0.0 3.16 3.16 3.2
ServerInfo 1 1.9 1.9 0.0 1.9 1.9 1.9
DeleteRefs 1 1.79 1.79 0.0 1.79 1.79 1.8
Check 11 0.06 0.06 0.01 0.06 0.03 0.7

Steps to reproduce

  1. Create an MR
  2. Do not merge it
  3. Wait three months
  4. Attempt to load MR - timeout or slow load occurs

What is the current bug behavior?

ListConflictFiles is exceeding 55 seconds, causing the MR page to fail to load.

What is the expected correct behavior?

ListConflictFiles runs quickly and the page loads.

Relevant logs and/or screenshots

production.log

Completed 500 Internal Server Error in 55041ms (ActiveRecord: 4.7ms | Elasticsearch: 0.0ms) 

GRPC::DeadlineExceeded (4:Deadline Exceeded): 

lib/gitlab/gitaly_client/conflict_files_stitcher.rb:15:in `each' 
lib/gitlab/gitaly_client/conflict_files_stitcher.rb:15:in `each' 
lib/gitlab/gitaly_client/conflicts_service.rb:29:in `any?' 
lib/gitlab/gitaly_client/conflicts_service.rb:29:in `conflicts?' 
lib/gitlab/git/repository.rb:938:in `can_be_merged?' 
app/models/repository.rb:499:in `method_missing' 
app/models/merge_request.rb:725:in `check_if_can_be_merged' 
app/controllers/projects/merge_requests_controller.rb:272:in `mark_merge_request_mergeable' 
app/controllers/projects/merge_requests_controller.rb:38:in `show' 
lib/gitlab/i18n.rb:55:in `with_locale' 
lib/gitlab/i18n.rb:61:in `with_user_locale' 
app/controllers/application_controller.rb:423:in `set_locale' 
lib/gitlab/middleware/multipart.rb:103:in `call' 
lib/gitlab/request_profiler/middleware.rb:16:in `call' 
ee/lib/gitlab/jira/middleware.rb:17:in `call' 
lib/gitlab/middleware/go.rb:20:in `call' 
lib/gitlab/etag_caching/middleware.rb:13:in `call' 
lib/gitlab/middleware/correlation_id.rb:16:in `block in call' 
lib/gitlab/correlation_id.rb:15:in `use_id' 
lib/gitlab/middleware/correlation_id.rb:15:in `call' 
lib/gitlab/middleware/read_only/controller.rb:42:in `call' 
lib/gitlab/middleware/read_only.rb:18:in `call' 
lib/gitlab/middleware/basic_health_check.rb:25:in `call' 
lib/gitlab/request_context.rb:20:in `call' 
lib/gitlab/metrics/requests_rack_middleware.rb:29:in `call' 
lib/gitlab/middleware/release_env.rb:13:in `call'

production_json.log

{"method":"GET","path":"/group/project/merge_requests/54","format":"html","controller":"Projects::MergeRequestsController","action":"show","status":500,"error":"GRPC::DeadlineExceeded: 4:Deadline Exceeded","duration":55042.49,"view":0.0,"db":4.71,"time":"2019-03-05T00:53:15.503Z","params":[{"key":"namespace_id","value":"group"},{"key":"project_id","value":"project"},{"key":"id","value":"54"}],"remote_ip":"1.1.1.1","user_id":66,"username":"user","ua":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36","gitaly_calls":2,"correlation_id":"DDKqZaEzOt8"}

Gitaly current (same problem, different instance of error)

2019-04-02_17:04:22.11263 time="2019-04-02T17:04:22Z" level=info msg="finished streaming call with code Canceled" correlation_id=SETNdfG9hAa error="
rpc error: code = Canceled desc = rpc error: code = Canceled desc = context canceled" grpc.code=Canceled grpc.meta.auth_version=v2 grpc.meta.client_
name=gitlab-web grpc.method=ListConflictFiles grpc.request.deadline="2019-04-02T17:04:22Z" grpc.request.fullMethod=/gitaly.ConflictsService/ListConf
lictFiles grpc.request.glProjectPath=group/project grpc.request.glRepository=project-34 grpc.request.repoPath=group/project.git grpc.request.repoStorage
=default grpc.request.topLevelGroup=code grpc.service=gitaly.ConflictsService grpc.start_time="2019-04-02T17:03:27Z" grpc.time_ms=55001.2 peer.addre
ss=@ span.kind=server system=grpc

Output of checks

Issue first reported with GitLab EE v11.8.1, still occurring on v11.10.2

I have not been able to reproduce the error on GitLab.com

Results of GitLab environment info

Expand for output related to GitLab environment info

Requested - will add when received from customer.

Results of GitLab application Check

Expand for output related to the GitLab application check

Requested - will add when received from customer.

Possible fixes

/cc @jacobvosmaer-gitlab @oswaldo

Edited by 🤖 GitLab Bot 🤖