Push to LFS fails with `i/o timeout` when running in AWS configured for HA

Summary

When using LFS in an HA configuration on AWS, pushing LFS objects fails with an "i/o timeout" error. Pushing a repo without LFS works fine. Adding a small file (~100b) that is tracked by LFS results in an "i/o timeout" error after many retries and the push does not complete.

Log files from git with debug printing turned on and GitLab are attached.

Steps to reproduce

I attempted to follow the HA configuration instructions as closely as possible. I'm running GitLab in a VPC in AWS with a postgresql RDS instance, a redis Elasticache instance, and NFS shares provided by a SoftNAS instance. Installation done via omnibus. The "primary" gitlab instance used to manage updates and migrations is in a private subnet. Replica instances are in different private subnets behind and Application Load Balancer, which is performing SSL termination and forwarding 443 -> 80 with a DNS entry mapping to the "external_url" setting. Only the replicas get traffic in normal use in this setup.

Changes to gitlab.rb on the "master" instance:

# Setup HA role
roles ['application_role']

# Setup LFS
gitlab_rails['lfs_enabled'] = true
gitlab_rails['lfs_storage_path'] = "/mnt/lfs-objects"

# Setup Postgres
gitlab_rails['db_adapter'] = "postgresql"
gitlab_rails['db_encoding'] = "unicode"
gitlab_rails['db_username'] = "gitlab"
gitlab_rails['db_password'] = "XXX"
gitlab_rails['db_host'] = "XXX"
gitlab_rails['db_port'] = 5432

# Setup Redis
gitlab_rails['redis_host'] = "XXX"
gitlab_rails['redis_port'] = 6379
gitlab_rails['redis_database'] = 0

# Setup nginx to expect SSL termination to happen at the load balancer
nginx['enable'] = true
nginx['redirect_http_to_https'] = false
nginx['listen_port'] = 80
nginx['listen_https'] = false
nginx['proxy_set_headers'] = {
  "X-Forwarded-Proto" => "https",
  "X-Forwarded-Ssl" => "on"
}

Changes to gitlab.rb on the "replica" instances:

# Setup HA role
roles ['application_role']

# Setup LFS
gitlab_rails['lfs_enabled'] = true
gitlab_rails['lfs_storage_path'] = "/mnt/lfs-objects"

# Setup Postgres
gitlab_rails['db_adapter'] = "postgresql"
gitlab_rails['db_encoding'] = "unicode"
gitlab_rails['db_username'] = "gitlab"
gitlab_rails['db_password'] = "XXX"
gitlab_rails['db_host'] = "XXX"
gitlab_rails['db_port'] = 5432

# Setup Redis
gitlab_rails['redis_host'] = "XXX"
gitlab_rails['redis_port'] = 6379
gitlab_rails['redis_database'] = 0

# Setup nginx to expect SSL termination to happen at the load balancer
nginx['enable'] = true
nginx['redirect_http_to_https'] = false
nginx['listen_port'] = 80
nginx['listen_https'] = false
nginx['proxy_set_headers'] = {
  "X-Forwarded-Proto" => "https",
  "X-Forwarded-Ssl" => "on"
}

## Replica only
gitlab_rails['auto_migrate'] = false

# Add secrets from master instance
gitlab_shell['secret_token'] = 'XXX'
gitlab_rails['otp_key_base'] = 'XXX'
gitlab_rails['secret_key_base'] = 'XXX'
gitlab_rails['db_key_base'] = 'XXX'
gitlab_workhorse['secret_token'] = 'XXX'

It is not clear if there is an issue with my configuration, GitLab, or simply missing information in the docs.

What is the current bug behavior?

git push origin master fails when LFS files are present with an 'i/o timeout' error

What is the expected correct behavior?

git push origin master completes successfully and LFS files are stored in GitLab

Relevant logs and/or screenshots

production.txt - output from gitlab log output.txt - output from git client with debug printing turned on. Note, URLs have been scrubbed.

production.log

output.txt

Results of GitLab environment info

Expand for output related to GitLab environment info

System information System: Ubuntu 16.04 Proxy: no Current User: git Using RVM: no Ruby Version: 2.3.6p384 Gem Version: 2.6.13 Bundler Version:1.13.7 Rake Version: 12.3.0 Redis Version: 3.2.11 Git Version: 2.14.3 Sidekiq Version:5.0.5 Go Version: unknown

GitLab information Version: 10.4.3-ee Revision: c65e2ba Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: postgresql DB Version: 9.6.5 URL: https://XXXXX HTTP Clone URL: https://XXXXXX/some-group/some-project.git SSH Clone URL: git@XXXXXX:some-group/some-project.git Elasticsearch: no Geo: no Using LDAP: no Using Omniauth: no

GitLab Shell Version: 5.11.0 Repository storage paths:

  • default: /var/opt/gitlab/git-data/repositories Hooks: /opt/gitlab/embedded/service/gitlab-shell/hooks Git: /opt/gitlab/embedded/bin/git

Results of GitLab application Check

Expand for output related to the GitLab application check

sudo gitlab-rake gitlab:check SANITIZE=true Checking GitLab Shell ...

GitLab Shell version >= 5.11.0 ? ... OK (5.11.0) Repo base directory exists? default... yes Repo storage directories are symlinks? default... no Repo paths owned by git:root, or git:git? default... yes Repo paths access is drwxrws---? default... yes hooks directories in repos are links: ... 2/1 ... ok 2/2 ... repository is empty 2/3 ... ok 3/4 ... ok 2/5 ... repository is empty 2/6 ... repository is empty 2/7 ... repository is empty 2/8 ... ok 2/9 ... repository is empty 2/10 ... repository is empty 2/11 ... ok 2/12 ... ok 2/14 ... ok 2/15 ... ok 5/16 ... ok Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Check GitLab API access: OK Redis available via internal API: OK

Access to /var/opt/gitlab/.ssh/authorized_keys: OK gitlab-shell self-check successful

Checking GitLab Shell ... Finished

Checking Sidekiq ...

Running? ... yes Number of Sidekiq processes ... 1

Checking Sidekiq ... Finished

Reply by email is disabled in config/gitlab.yml Checking LDAP ...

LDAP is disabled in config/gitlab.yml

Checking LDAP ... Finished

Checking GitLab ...

Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... skipped (no tmp uploads folder yet) Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... 2/1 ... yes 2/2 ... yes 2/3 ... yes 3/4 ... yes 2/5 ... yes 2/6 ... yes 2/7 ... yes 2/8 ... yes 2/9 ... yes 2/10 ... yes 2/11 ... yes 2/12 ... yes 2/14 ... yes 2/15 ... yes 5/16 ... yes Redis version >= 2.8.0? ... yes Ruby version >= 2.3.5 ? ... yes (2.3.6) Git version >= 2.7.3 ? ... yes (2.14.3) Git user has default SSH configuration? ... yes Active users: ... 5 Elasticsearch version 5.1 - 5.5? ... skipped (elasticsearch is disabled)

Checking GitLab ... Finished

(we will only investigate if the tests are passing)

Possible fixes

(If you can, link to the line of code that might be responsible for the problem)

Assignee Loading
Time tracking Loading