No limits on size of diffs returned by the /projects/:id/repository/compare API endpoint

Summary

The /projects/:id/repository/compare API endpoint will attempt to return arbitrarily large diffs. When the comparison is particularly big this will cause the request to exceed the Gitaly timeout and a 500 error is returned. Although the Gitaly request is hitting its deadline, we found that most of the time is spent waiting for Rails to render the diffs.

The customer reporting this issue received as 21 MB response once they raised their gitaly timeout to > 180 seconds.

Steps to reproduce

  1. Create a new project and add a large file such as a 1,000,000 line csv
  2. Create a new branch big-diff and make a schema change to the csv like adding quotes around a column
  3. Commit the change
  4. Execute curl -H "PRIVATE-TOKEN: <TOKEN>" "https://gitlab.com/api/v4/projects/<PROJ_NUM>/repository/compare?from=master&to=other", the response will be {"message":"500 Internal Server Error"}

Example Project

https://gitlab.com/wchandler/big-diff

What is the current bug behavior?

The compare endpoint will attempt to render an unbounded amount of diffs, resulting in the job being killed

What is the expected correct behavior?

The length of diffs to render should have a maximum length, as we currently do in the UI and the /projects/:id/merge_requests/:merge_request_iid/changes API endpoint.

Relevant logs and/or screenshots

/var/log/gitlab/gitlab-rails/production.log

GRPC::DeadlineExceeded (4:Deadline Exceeded):
  /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/grpc-1.19.0-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:31:in `check_status'
  /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/grpc-1.19.0-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:181:in `attach_status_results_and_complete
_call'
  /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/grpc-1.19.0-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:170:in `receive_and_check_status'
  /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/grpc-1.19.0-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:338:in `each_remote_read_then_finish'
  /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/gitaly_client/diff_stitcher.rb:15:in `each'
  /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/gitaly_client/diff_stitcher.rb:15:in `each'
  /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/git/diff_collection.rb:116:in `each_gitaly_patch'
  /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/instrumentation.rb:161:in `block in each_gitaly_patch'
  /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/method_call.rb:36:in `measure'
  /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/instrumentation.rb:161:in `each_gitaly_patch'
  /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/git/diff_collection.rb:50:in `each'
  /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/instrumentation.rb:161:in `block in each'
  /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/method_call.rb:36:in `measure'
  /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/instrumentation.rb:161:in `each'
  /opt/gitlab/embedded/service/gitlab-rails/lib/api/entities.rb:1175:in `to_a'
  /opt/gitlab/embedded/service/gitlab-rails/lib/api/entities.rb:1175:in `block in <class:Compare>'

/var/log/gitlab/api_json.log

{
  "time": "2019-12-16T14:39:55.026Z",
  "severity": "INFO",
  "duration": 30216.38,
  "db": 2.95,
  "view": 30213.43,
  "status": 500,
  "method": "GET",
  "path": "/api/v4/projects/280/repository/compare",
  "params": [
    {
      "key": "from",
      "value": "master"
    },
    {
      "key": "to",
      "value": "other"
    }
  ],
  "host": "www.wchandler-gitlab.com",
  "remote_ip": "75.118.3.149, 75.118.3.149",
  "ua": "curl/7.54.0",
  "route": "/api/:version/projects/:id/repository/compare",
  "user_id": 1,
  "username": "root",
  "queue_duration": 33.64,
  "gitaly_calls": 3,
  "gitaly_duration": 8.95,
  "rugged_calls": 2,
  "rugged_duration_ms": 3.61,
  "correlation_id": "xJn6p54KhE7"
}

/var/log/gitlab/gitaly/current

{
  "correlation_id": "xJn6p54KhE7",
  "error": "rpc error: code = DeadlineExceeded desc = rpc error: code = Unavailable desc = CommitDiff: send: rpc error: code = DeadlineExceeded desc = context deadline exceeded",
  "grpc.code": "DeadlineExceeded",
  "grpc.meta.auth_version": "v2",
  "grpc.meta.client_name": "gitlab-web",
  "grpc.method": "CommitDiff",
  "grpc.request.deadline": "2019-12-16T14:39:54Z",
  "grpc.request.fullMethod": "/gitaly.DiffService/CommitDiff",
  "grpc.request.glProjectPath": "root/slow-api-render",
  "grpc.request.glRepository": "project-280",
  "grpc.request.repoPath": "@hashed/7f/0a/7f0a22117f8fe0172cf9209ff622b64a51aaeda21d58b5b62685a93dbe2dad25.git",
  "grpc.request.repoStorage": "default",
  "grpc.request.topLevelGroup": "@hashed",
  "grpc.service": "gitaly.DiffService",
  "grpc.start_time": "2019-12-16T14:39:24Z",
  "grpc.time_ms": 30000.348,
  "level": "warning",
  "msg": "finished streaming call with code DeadlineExceeded",
  "peer.address": "@",
  "pid": 10110,
  "span.kind": "server",
  "system": "grpc",
  "time": "2019-12-16T14:39:54Z"
}
{
  "args": [
    "/opt/gitlab/embedded/bin/git",
    "--git-dir",
    "/var/opt/gitlab/git-data/repositories/@hashed/7f/0a/7f0a22117f8fe0172cf9209ff622b64a51aaeda21d58b5b62685a93dbe2dad25.git",
    "diff",
    "--patch",
    "--raw",
    "--abbrev=40",
    "--full-index",
    "--find-renames=30%",
    "8e3de42bb3d7eba64bb15e393a0a1b2daa8228cc",
    "3bbf2048acafd17b6b7cc4d39aaf55e921ccd117"
  ],
  "command.exitCode": 0,
  "command.inblock": 0,
  "command.maxrss": 420020,
  "command.oublock": 0,
  "command.real_time_ms": 30000.153572,
  "command.system_time_ms": 236,
  "command.user_time_ms": 1032,
  "correlation_id": "xJn6p54KhE7",
  "grpc.meta.auth_version": "v2",
  "grpc.meta.client_name": "gitlab-web",
  "grpc.method": "CommitDiff",
  "grpc.request.deadline": "2019-12-16T14:39:54Z",
  "grpc.request.fullMethod": "/gitaly.DiffService/CommitDiff",
  "grpc.request.glProjectPath": "root /slow-api-render",
  "grpc.request.glRepository": "project-280",
  "grpc.request.repoPath": "@hashed/7f/0a/7f0a22117f8fe0172cf9209ff622b64a51aaeda21d58b5b62685a93dbe2dad25.git",
  "grpc.request.repoStorage": "default",
  "grpc.request.topLevelGroup": "@hashed",
  "grpc.service": "gitaly.DiffService",
  "grpc.start_time": "2019-12-16T1 4:39:24Z",
  "level": "debug",
  "msg": "spawn complete",
  "path": "/opt/gitlab/embedded/bin/git",
  "peer.address": "@",
  "pid": 25057,
  "span.kind": "server",
  "system": "grpc",
  " time": "2019-12-16T14:39:54Z"
}

Output of checks

This bug happens on GitLab.com

Results of GitLab environment info

Expand for output related to GitLab environment info

System information System: Ubuntu 18.04 Proxy: no Current User: git Using RVM: no Ruby Version: 2.6.3p62 Gem Version: 2.7.9 Bundler Version:1.17.3 Rake Version: 12.3.3 Redis Version: 3.2.12 Git Version: 2.22.2 Sidekiq Version:5.2.7 Go Version: unknown

GitLab information Version: 12.5.4-ee Revision: 2a57951c0ee Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 10.9 Elasticsearch: no Geo: no Using LDAP: no Using Omniauth: yes Omniauth Providers:

GitLab Shell Version: 10.2.0 Repository storage paths:

  • default: /var/opt/gitlab/git-data/repositories GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell Git: /opt/gitlab/embedded/bin/git

Results of GitLab application Check

Expand for output related to the GitLab application check

Checking GitLab subtasks ...

Checking GitLab Shell ...

GitLab Shell: ... GitLab Shell version >= 10.2.0 ? ... OK (10.2.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Internal API available: OK Redis available via internal API: OK gitlab-shell self-check successful

Checking GitLab Shell ... Finished

Checking Gitaly ...

Gitaly: ... default ... OK

Checking Gitaly ... Finished

Checking Sidekiq ...

Sidekiq: ... Running? ... yes Number of Sidekiq processes ... 1

Checking Sidekiq ... Finished

Checking Incoming Email ...

Incoming Email: ... Reply by email is disabled in config/gitlab.yml

Checking Incoming Email ... Finished

Checking LDAP ...

LDAP: ... LDAP is disabled in config/gitlab.yml

Checking LDAP ... Finished

Checking GitLab App ...

Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... skipped (no tmp uploads folder yet) Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... 1/1 ... yes Redis version >= 2.8.0? ... yes Ruby version >= 2.5.3 ? ... yes (2.6.3) Git version >= 2.22.0 ? ... yes (2.22.2) Git user has default SSH configuration? ... yes Active users: ... 1 Is authorized keys file accessible? ... yes Elasticsearch version 5.6 - 6.x? ... skipped (elasticsearch is disabled)

Checking GitLab App ... Finished

Checking GitLab subtasks ... Finished

/cc @oswaldo

Edited by Will Chandler (ex-GitLab)