502 and Slowness Loading Large Merge Requests

Summary

Unable to reliably load large merge requests. This is a major issue in our workflow.

Opening a large merge request with lots of commits takes up to 60 seconds to load (if it ever loads) and causes 100% CPU on the server. If large merge request doesn't load, it throws a 502.

Steps to reproduce

I was able to reproduce on gitlab.com! Very hit and miss though.

Try to load this merge request page in several different tabs simultaneously - I got a friendly GitLab branded 502, and the page took 60 seconds to error: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/9546

or setup a merge request with the following conditions: At least 355 discussions, 1570 commits, and 317 changes and try to load it multiple times in multiple windows simultaneously. Eventually it will trigger the 502.

Example Project

https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/9546

What is the current bug behavior?

Loading the merge request takes upwards of 30 seconds - someitmes up to a full 60 seconds. During that client waiting time (server is processing), the CPU on the server spikes. If the page doesn't load within the 60 seconds, it displays the friendly GitLab branded 502 message ("Whoops, GitLab is taking too much time to respond.")

Some stats I observed on some merge request pages to illustrate the slowness: Initial page load (waiting) - 30 seconds, if it even loads. Content download is 11.8 MB and takes 8 seconds. If it can't load it will throw a 502. JSON file load for the diff if the initial page loads (3.8 KB) - 40 seconds

What is the expected correct behavior?

Merge request should load in a second or two and not display a 502 if multiple people are viewing it.

Relevant logs and/or screenshots

502 error: gitlab502

Loading time: gitlabtiming

I found these log files on our local installation but I'm not sure if they are related:

[2018-01-26T02:20:05.336094 #14337]  INFO -- : worker=0 ready
W, [2018-01-26T02:21:04.505397 #13949]  WARN -- : #<Unicorn::HttpServer:0x00007f0e07cca968>: worker (pid: 13949) exceeds memory limit (578632192.0 bytes > 545250480 bytes)
W, [2018-01-26T02:21:04.505560 #13949]  WARN -- : Unicorn::WorkerKiller send SIGQUIT (pid: 13949) alive: 114 sec (trial 1)
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/daemon.rb:46:in `join': Target thread must not be current thread (ThreadError)
        from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/daemon.rb:46:in `block in stop'
        from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/daemon.rb:39:in `synchronize'
        from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/daemon.rb:39:in `stop'
        from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/base_sampler.rb:19:in `rescue in safe_sample'
        from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/base_sampler.rb:16:in `safe_sample'
        from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/base_sampler.rb:52:in `start_working'
        from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/daemon.rb:34:in `block (2 levels) in start'
I, [2018-01-26T02:21:05.000249 #506]  INFO -- : reaped #<Process::Status: pid 13949 exit 0> worker=2

and

2018-01-25_20:17:11.87583 2018-01  -25T20:17:11.875Z 2981 TID-ovw7o9zoo INFO: Booting Sidekiq 5.0.4 with redis options {:path=>"/var/opt/gitlab/redis/redis.socket", :namespace=>"resque:gitlab", :id=>"Sidekiq-server-PID-2981", :url=>nil}

2018-01-25_20:01:55.56319 2018-01-25T20:01:55.562Z 8221 TID-ouz0c7h1o WARN: current RSS 1120868 exceeds maximum RSS 1000000

2018-01-25_20:01:55.56338 2018-01-25T20:01:55.563Z 8221 TID-ouz0c7h1o WARN: this thread will shut down PID 8221 - Worker UpdateMergeRequestsWorker - JID-34eb15d90344865a08191412 in 900 seconds

2018-01-25_17:50:55.16244 2018-01-25T17:50:55.162Z 7339 TID-ovpv4xwzk WARN: sending SIGTERM to PID 7339 - Worker UpdateMergeRequestsWorker - JID-bc194be0b204f9fcfc893761

2018-01-25_17:50:55.16266 2018-01-25T17:50:55.162Z 7339 TID-ovpv4xwzk WARN: waiting 30 seconds before sending SIGKILL to PID 7339 - Worker UpdateMergeRequestsWorker - JID-bc194be0b204f9fcfc893761

2018-01-25_17:50:55.16285 2018-01-25T17:50:55.162Z 7339 TID-ovq7h6va8 INFO: Shutting down

2018-01-25_17:50:55.16300 2018-01-25T17:50:55.162Z 7339 TID-ovq57i2os INFO: Scheduler exiting...

2018-01-25_17:50:55.16326 2018-01-25T17:50:55.163Z 7339 TID-ovq7h6va8 INFO: Terminating quiet workers

2018-01-25_17:50:55.16340 2018-01-25T17:50:55.163Z 7339 TID-ovq4hr68c INFO: Scheduler exiting...

2018-01-25_17:50:55.66382 2018-01-25T17:50:55.663Z 7339 TID-ovq7h6va8 INFO: Pausing to allow workers to finish...

2018-01-25_17:50:58.16515 2018-01-25T17:50:58.165Z 7339 TID-ovq7h6va8 INFO: Bye!

Output of checks

This bug happens on GitLab.com

Results of GitLab environment info

Not applicable because it happens on Gitlab.com too.

Results of GitLab application Check

Not applicable because it happens on Gitlab.com too.

Possible fixes

Honestly I have no idea. Can we pre-render merge requests before they are requested?

Possibly related? https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2421