Skip to content

Importing large exports via rake task causes Postgres connection timeout

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Summary

When importing large (in this case, 20 GB) projects into GitLab via the Rake task, the import fails due to a pgsql database connection timeout.

Steps to reproduce

  1. Create an extremely large project export; in our case, about 20 GB
  2. Import into a Gitlab instance with (relatively) slow disk I/O; in our case, copying the file took about 5 minutes
  3. After the archive is copied into /var/opt/gitlab/gitlab-rails/uploads/, the import will fail
  4. To add insult to injury, subsequent import attempts will re-copy the file, rather than reusing the existing file, guaranteeing further failures as well as disk space issues.

Example Project

The project that we're using is not only too large to feasibly share, it is sensitive/proprietary data that can't be made public. Theoretically, any sufficiently large repository will do.

What is the current bug behavior?

The import fails due to a pgsql error.

What is the expected correct behavior?

The import succeeds as expected.

Relevant logs and/or screenshots

user@gitlab:/tmp$ sudo gitlab-rake "gitlab:import_export:import[devops, groupname, projectname, /tmp/2021-08-04_16-34-501_groupname_projectname_export.tar.gz]"
**************************************************
⛔️ WARNING: Sidekiq testing API enabled, but this is not the test environment.  Your jobs will not go to Redis.
**************************************************
I, [2021-08-05T08:27:06.252928 #3096049]  INFO -- : Importing GitLab export: /tmp/2021-08-04_16-34-501_groupname_projectname_export.tar.gz into GitLab groupname/projectname as Devops
E, [2021-08-05T08:32:35.303914 #3096049] ERROR -- : Exception: PG::UnableToSend: no connection to the server

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info

System information
System:         Ubuntu 20.04
Proxy:          no
Current User:   git
Using RVM:      no
Ruby Version:   2.7.2p137
Gem Version:    3.1.4
Bundler Version:2.1.4
Rake Version:   13.0.3
Redis Version:  6.0.14
Git Version:    2.31.1
Sidekiq Version:5.2.9
Go Version:     unknown

GitLab information
Version:        13.11.7-ee
Revision:       1c6dc95d18b
Directory:      /opt/gitlab/embedded/service/gitlab-rails
DB Adapter:     PostgreSQL
DB Version:     12.6
URL:            https://[redacted]
HTTP Clone URL: https://[redacted]/some-group/some-project.git
SSH Clone URL:  git@[redacted]:some-group/some-project.git
Elasticsearch:  no
Geo:            yes
Geo node:       Primary
Using LDAP:     no
Using Omniauth: yes
Omniauth Providers: saml

GitLab Shell
Version:        13.17.0
Repository storage paths:
- default:      /var/opt/gitlab/git-data/repositories
GitLab Shell path:              /opt/gitlab/embedded/service/gitlab-shell
Git:            /opt/gitlab/embedded/bin/git

Results of GitLab application Check

Expand for output related to the GitLab application check

❯ sudo gitlab-rake gitlab:check SANITIZE=true Checking GitLab subtasks ...

Checking GitLab Shell ...

GitLab Shell: ... GitLab Shell version >= 13.17.0 ? ... OK (13.17.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Internal API available: OK Redis available via internal API: OK gitlab-shell self-check successful

Checking GitLab Shell ... Finished

Checking Gitaly ...

Gitaly: ... default ... OK

Checking Gitaly ... Finished

Checking Sidekiq ...

Sidekiq: ... Running? ... yes Number of Sidekiq processes (cluster/worker) ... 1/1

Checking Sidekiq ... Finished

Checking Incoming Email ...

Incoming Email: ... Reply by email is disabled in config/gitlab.yml

Checking Incoming Email ... Finished

Checking LDAP ...

LDAP: ... LDAP is disabled in config/gitlab.yml

Checking LDAP ... Finished

Checking GitLab App ...

Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... 2/1 ... yes 5/2 ... yes 7/3 ... yes 7/6 ... yes 7/8 ... yes 7/11 ... yes 9/12 ... yes 9/13 ... yes 9/14 ... yes 9/15 ... yes 9/16 ... yes 9/17 ... yes 9/18 ... yes 9/19 ... yes 9/20 ... yes 9/21 ... yes 11/22 ... yes 12/35 ... yes 14/37 ... yes 12/39 ... yes 14/41 ... yes 61/43 ... yes 62/44 ... yes 62/45 ... yes 9/46 ... yes 62/47 ... yes 70/48 ... yes 70/49 ... yes 70/50 ... yes 70/51 ... yes 70/52 ... yes 70/53 ... yes 70/54 ... yes 70/55 ... yes 70/56 ... yes 70/57 ... yes 70/58 ... yes 70/59 ... yes 70/60 ... yes 70/61 ... yes 70/62 ... yes 70/63 ... yes 70/64 ... yes 70/65 ... yes 59/68 ... yes 59/69 ... yes 61/70 ... yes 61/71 ... yes 112/73 ... yes 59/75 ... yes 78/77 ... yes 12/78 ... yes 59/79 ... yes 59/81 ... yes 121/82 ... yes 121/83 ... yes 59/86 ... yes 59/88 ... yes 9/90 ... yes 70/91 ... yes 70/92 ... yes 179/93 ... yes 14/94 ... yes 12/96 ... yes 12/97 ... yes 14/99 ... yes 12/100 ... yes 70/101 ... yes 59/102 ... yes 110/103 ... yes 133/104 ... yes 148/105 ... yes 146/106 ... yes 112/107 ... yes 2/108 ... yes 120/110 ... yes 120/111 ... yes 6/179 ... yes 6/180 ... yes 6/181 ... yes 120/191 ... yes 120/195 ... yes 148/197 ... yes 120/198 ... yes 167/199 ... yes 167/200 ... yes 167/202 ... yes 171/203 ... yes 61/207 ... yes 112/209 ... yes 179/210 ... yes 6/211 ... yes 6/215 ... yes 6/216 ... yes 6/218 ... yes 6/219 ... yes 225/221 ... yes 228/222 ... yes 59/224 ... yes 167/225 ... yes 179/226 ... yes 244/227 ... yes 244/228 ... yes 244/229 ... yes 244/230 ... yes 244/231 ... yes 242/232 ... yes 242/233 ... yes 242/234 ... yes 242/235 ... yes 242/236 ... yes 246/237 ... yes 246/238 ... yes 246/239 ... yes 246/240 ... yes 246/241 ... yes 245/242 ... yes 245/243 ... yes 245/244 ... yes 245/245 ... yes 245/246 ... yes 243/247 ... yes 243/248 ... yes 243/249 ... yes 243/250 ... yes 243/251 ... yes 247/253 ... yes 196/254 ... yes 171/255 ... yes 167/256 ... yes 171/257 ... yes 59/258 ... yes 59/259 ... yes 258/300 ... yes Redis version >= 5.0.0? ... yes Ruby version >= 2.7.2 ? ... yes (2.7.2) Git version >= 2.31.0 ? ... yes (2.31.1) Git user has default SSH configuration? ... yes Active users: ... 209 Is authorized keys file accessible? ... skipped (authorized keys not enabled) GitLab configured to store new projects in hashed storage? ... yes All projects are in hashed storage? ... yes Elasticsearch version 7.x (6.4 - 6.x deprecated to be removed in 13.8)? ... skipped (elasticsearch is disabled)

Checking GitLab App ... Finished

Checking Geo ...

GitLab Geo is available ... GitLab Geo is enabled ... yes This machine's Geo node name matches a database record ... yes, found a primary node named "gitlab-ee-van" HTTP/HTTPS repository cloning is enabled ... yes Machine clock is synchronized ... Exception: Timeout::Error Git user has default SSH configuration? ... yes OpenSSH configured to use AuthorizedKeysCommand ... skipped Reason: Cannot access OpenSSH configuration file Try fixing it: This is expected if you are using SELinux. You may want to check configuration manually For more information see: doc/administration/operations/fast_ssh_key_lookup.md GitLab configured to disable writing to authorized_keys file ... yes GitLab configured to store new projects in hashed storage? ... yes All projects are in hashed storage? ... yes

Checking Geo ... Finished

Checking GitLab subtasks ... Finished

Possible fixes

In my case, I worked around the issue by increasing the postgres idle_in_transaction_session_timeout for the gitlab user to five minutes:

ALTER USER gitlab SET idle_in_transaction_session_timeout TO 300000;

This could theoretically be solved by one of the following:

  1. Increase the given timeout for the current pgsql session: SET idle_in_transaction_session_timeout TO '3000';; difficult to know what to set it to;
  2. Don't open the connection/start the transaction until after the file is copied; or
  3. Reconnect if the pgsql connection gets disconnected.
Edited by 🤖 GitLab Bot 🤖