Importing large exports via rake task causes Postgres connection timeout
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary
When importing large (in this case, 20 GB) projects into GitLab via the Rake task, the import fails due to a pgsql database connection timeout.
Steps to reproduce
- Create an extremely large project export; in our case, about 20 GB
- Import into a Gitlab instance with (relatively) slow disk I/O; in our case, copying the file took about 5 minutes
- After the archive is copied into
/var/opt/gitlab/gitlab-rails/uploads/
, the import will fail - To add insult to injury, subsequent import attempts will re-copy the file, rather than reusing the existing file, guaranteeing further failures as well as disk space issues.
Example Project
The project that we're using is not only too large to feasibly share, it is sensitive/proprietary data that can't be made public. Theoretically, any sufficiently large repository will do.
What is the current bug behavior?
The import fails due to a pgsql error.
What is the expected correct behavior?
The import succeeds as expected.
Relevant logs and/or screenshots
user@gitlab:/tmp$ sudo gitlab-rake "gitlab:import_export:import[devops, groupname, projectname, /tmp/2021-08-04_16-34-501_groupname_projectname_export.tar.gz]"
**************************************************
⛔️ WARNING: Sidekiq testing API enabled, but this is not the test environment. Your jobs will not go to Redis.
**************************************************
I, [2021-08-05T08:27:06.252928 #3096049] INFO -- : Importing GitLab export: /tmp/2021-08-04_16-34-501_groupname_projectname_export.tar.gz into GitLab groupname/projectname as Devops
E, [2021-08-05T08:32:35.303914 #3096049] ERROR -- : Exception: PG::UnableToSend: no connection to the server
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
System information System: Ubuntu 20.04 Proxy: no Current User: git Using RVM: no Ruby Version: 2.7.2p137 Gem Version: 3.1.4 Bundler Version:2.1.4 Rake Version: 13.0.3 Redis Version: 6.0.14 Git Version: 2.31.1 Sidekiq Version:5.2.9 Go Version: unknown GitLab information Version: 13.11.7-ee Revision: 1c6dc95d18b Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 12.6 URL: https://[redacted] HTTP Clone URL: https://[redacted]/some-group/some-project.git SSH Clone URL: git@[redacted]:some-group/some-project.git Elasticsearch: no Geo: yes Geo node: Primary Using LDAP: no Using Omniauth: yes Omniauth Providers: saml GitLab Shell Version: 13.17.0 Repository storage paths: - default: /var/opt/gitlab/git-data/repositories GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell Git: /opt/gitlab/embedded/bin/git
Results of GitLab application Check
Expand for output related to the GitLab application check
❯ sudo gitlab-rake gitlab:check SANITIZE=true Checking GitLab subtasks ...
Checking GitLab Shell ...
GitLab Shell: ... GitLab Shell version >= 13.17.0 ? ... OK (13.17.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Internal API available: OK Redis available via internal API: OK gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Gitaly ...
Gitaly: ... default ... OK
Checking Gitaly ... Finished
Checking Sidekiq ...
Sidekiq: ... Running? ... yes Number of Sidekiq processes (cluster/worker) ... 1/1
Checking Sidekiq ... Finished
Checking Incoming Email ...
Incoming Email: ... Reply by email is disabled in config/gitlab.yml
Checking Incoming Email ... Finished
Checking LDAP ...
LDAP: ... LDAP is disabled in config/gitlab.yml
Checking LDAP ... Finished
Checking GitLab App ...
Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... 2/1 ... yes 5/2 ... yes 7/3 ... yes 7/6 ... yes 7/8 ... yes 7/11 ... yes 9/12 ... yes 9/13 ... yes 9/14 ... yes 9/15 ... yes 9/16 ... yes 9/17 ... yes 9/18 ... yes 9/19 ... yes 9/20 ... yes 9/21 ... yes 11/22 ... yes 12/35 ... yes 14/37 ... yes 12/39 ... yes 14/41 ... yes 61/43 ... yes 62/44 ... yes 62/45 ... yes 9/46 ... yes 62/47 ... yes 70/48 ... yes 70/49 ... yes 70/50 ... yes 70/51 ... yes 70/52 ... yes 70/53 ... yes 70/54 ... yes 70/55 ... yes 70/56 ... yes 70/57 ... yes 70/58 ... yes 70/59 ... yes 70/60 ... yes 70/61 ... yes 70/62 ... yes 70/63 ... yes 70/64 ... yes 70/65 ... yes 59/68 ... yes 59/69 ... yes 61/70 ... yes 61/71 ... yes 112/73 ... yes 59/75 ... yes 78/77 ... yes 12/78 ... yes 59/79 ... yes 59/81 ... yes 121/82 ... yes 121/83 ... yes 59/86 ... yes 59/88 ... yes 9/90 ... yes 70/91 ... yes 70/92 ... yes 179/93 ... yes 14/94 ... yes 12/96 ... yes 12/97 ... yes 14/99 ... yes 12/100 ... yes 70/101 ... yes 59/102 ... yes 110/103 ... yes 133/104 ... yes 148/105 ... yes 146/106 ... yes 112/107 ... yes 2/108 ... yes 120/110 ... yes 120/111 ... yes 6/179 ... yes 6/180 ... yes 6/181 ... yes 120/191 ... yes 120/195 ... yes 148/197 ... yes 120/198 ... yes 167/199 ... yes 167/200 ... yes 167/202 ... yes 171/203 ... yes 61/207 ... yes 112/209 ... yes 179/210 ... yes 6/211 ... yes 6/215 ... yes 6/216 ... yes 6/218 ... yes 6/219 ... yes 225/221 ... yes 228/222 ... yes 59/224 ... yes 167/225 ... yes 179/226 ... yes 244/227 ... yes 244/228 ... yes 244/229 ... yes 244/230 ... yes 244/231 ... yes 242/232 ... yes 242/233 ... yes 242/234 ... yes 242/235 ... yes 242/236 ... yes 246/237 ... yes 246/238 ... yes 246/239 ... yes 246/240 ... yes 246/241 ... yes 245/242 ... yes 245/243 ... yes 245/244 ... yes 245/245 ... yes 245/246 ... yes 243/247 ... yes 243/248 ... yes 243/249 ... yes 243/250 ... yes 243/251 ... yes 247/253 ... yes 196/254 ... yes 171/255 ... yes 167/256 ... yes 171/257 ... yes 59/258 ... yes 59/259 ... yes 258/300 ... yes Redis version >= 5.0.0? ... yes Ruby version >= 2.7.2 ? ... yes (2.7.2) Git version >= 2.31.0 ? ... yes (2.31.1) Git user has default SSH configuration? ... yes Active users: ... 209 Is authorized keys file accessible? ... skipped (authorized keys not enabled) GitLab configured to store new projects in hashed storage? ... yes All projects are in hashed storage? ... yes Elasticsearch version 7.x (6.4 - 6.x deprecated to be removed in 13.8)? ... skipped (elasticsearch is disabled)
Checking GitLab App ... Finished
Checking Geo ...
GitLab Geo is available ... GitLab Geo is enabled ... yes This machine's Geo node name matches a database record ... yes, found a primary node named "gitlab-ee-van" HTTP/HTTPS repository cloning is enabled ... yes Machine clock is synchronized ... Exception: Timeout::Error Git user has default SSH configuration? ... yes OpenSSH configured to use AuthorizedKeysCommand ... skipped Reason: Cannot access OpenSSH configuration file Try fixing it: This is expected if you are using SELinux. You may want to check configuration manually For more information see: doc/administration/operations/fast_ssh_key_lookup.md GitLab configured to disable writing to authorized_keys file ... yes GitLab configured to store new projects in hashed storage? ... yes All projects are in hashed storage? ... yes
Checking Geo ... Finished
Checking GitLab subtasks ... Finished
Possible fixes
In my case, I worked around the issue by increasing the postgres idle_in_transaction_session_timeout
for the gitlab
user to five minutes:
ALTER USER gitlab SET idle_in_transaction_session_timeout TO 300000;
This could theoretically be solved by one of the following:
- Increase the given timeout for the current pgsql session:
SET idle_in_transaction_session_timeout TO '3000';
; difficult to know what to set it to; - Don't open the connection/start the transaction until after the file is copied; or
- Reconnect if the pgsql connection gets disconnected.