Postgresql configuration mismatch

I have issue trying to migrate from version 9 to version of 10 of Postgresql bundled with gitlab-ce. It wasnt first time , moving from 8 to 9 was also painful. On the end it is not gitlab-ce issue at all.

Problem to solve

When trying to migrate from version 9 to 10, migration script return to log file follow error :

-----------------------------------------------------------------
  pg_upgrade run on Wed Jul 31 08:22:35 2019
-----------------------------------------------------------------

Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok

*failure*
Consult the last few lines of "pg_upgrade_server.log" for
the probable cause of the failure.

connection to database failed: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/data/gitlab/postgresql/data/.s.PGSQL.50432"?
could not connect to source postmaster started with the command:
"/opt/gitlab/embedded/postgresql/9.6/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/data/gitlab/postgresql/data" -o "-p 50432 -b  -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/data/gitlab/postgresql/data'" start


-----------------------------------------------------------------
  pg_upgrade run on Wed Jul 31 08:22:35 2019
-----------------------------------------------------------------

command: "/opt/gitlab/embedded/postgresql/9.6/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/data/gitlab/postgresql/data" -o "-p 50432 -b  -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/data/gitlab/postgresql/data'" start >> "pg_upgrade_server.log" 2>&1
waiting for server to start....FATAL:  data directory "/data/gitlab/postgresql/data" has group or world access
DETAIL:  Permissions should be u=rwx (0700).
 stopped waiting
pg_ctl: could not start server
Examine the log output.

When fix permission on directory script return follow output :

Database contains orphaned GroupMembers? ... Exception: PG::UndefinedTable: ERROR:  relation "members" does not exist
LINE 8:                WHERE a.attrelid = '"members"'::regclass
                                          ^
:               SELECT a.attname, format_type(a.atttypid, a.atttypmod),
                     pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
                     c.collname, col_description(a.attrelid, a.attnum) AS comment
                FROM pg_attribute a
                LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
                LEFT JOIN pg_type t ON a.atttypid = t.oid
                LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation
               WHERE a.attrelid = '"members"'::regclass
                 AND a.attnum > 0 AND NOT a.attisdropped
               ORDER BY a.attnum

GitLab config exists? ... yes
GitLab config up to date? ... yes
Log directory writable? ... yes
Tmp directory writable? ... yes
Uploads directory exists? ... yes
Uploads directory has correct permissions? ... yes
Uploads directory tmp has correct permissions? ... yes
Init script exists? ... skipped (omnibus-gitlab has no init script)
Init script up-to-date? ... skipped (omnibus-gitlab has no init script)
Projects have namespace: ... Exception: PG::UndefinedTable: ERROR:  relation "projects" does not exist
LINE 8:                WHERE a.attrelid = '"projects"'::regclass
                                          ^
:               SELECT a.attname, format_type(a.atttypid, a.atttypmod),
                     pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
                     c.collname, col_description(a.attrelid, a.attnum) AS comment
                FROM pg_attribute a
                LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
                LEFT JOIN pg_type t ON a.atttypid = t.oid
                LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation
               WHERE a.attrelid = '"projects"'::regclass
                 AND a.attnum > 0 AND NOT a.attisdropped
               ORDER BY a.attnum

Redis version >= 2.8.0? ... yes
Ruby version >= 2.5.3 ? ... yes (2.6.3)
Git version >= 2.21.0 ? ... yes (2.21.0)
Git user has default SSH configuration? ... yes
Active users: ... Exception: PG::UndefinedTable: ERROR:  relation "users" does not exist
LINE 8:                WHERE a.attrelid = '"users"'::regclass
                                          ^
:               SELECT a.attname, format_type(a.atttypid, a.atttypmod),
                     pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
                     c.collname, col_description(a.attrelid, a.attnum) AS comment
                FROM pg_attribute a
                LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
                LEFT JOIN pg_type t ON a.atttypid = t.oid
                LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation
               WHERE a.attrelid = '"users"'::regclass
                 AND a.attnum > 0 AND NOT a.attisdropped
               ORDER BY a.attnum

All of this are very suspicious because happened only on 1 of 5 gitlab-ce instance I administrate.

First I found issue https://gitlab.com/gitlab-org/gitlab-ce/issues/28275 , but after careful reading i fount that for distributed gitlab-ce GO is not needed.

After little bit more investigation , try and error finally found different from my 4 "good" behaviour gitlab-ce instance and issued one. One issued one I have Postgresql on non-standard place and made change in /etc/gitlab/gitlab.rb alonge time ago.

Finally i found follow issue :

https://gitlab.com/gitlab-org/omnibus-gitlab/issues/4129 #1963 (closed) specially comment : #1963 (comment 67961397)

Further details

After double-check of my configuration in /etc/gitlab/gitlab.rb I finally realise that i have very old configuration command for Postgresql postgresql['dir'] only instead of postgresql['dir'], postgresql['data_dir'], postgresql['home'] , which cause configuration script to configure and create Postgresql instance on way that in one top level directory instance is incorporate another lower level Postgresql instance (like posgresql/data first level and postgresql/data/data second level instance). Even that configuration work correctly a long time migration script has issue with that type of structure, described before.

Solution

First create backup of your database, then stop gitlab instance (consider displaying deploying page) then move existing directory to another location, correct /etc/gitlab/gitlab.rb Postgresql part according your expectation at least postgresql['dir'], postgresql['data_dir'], postgresql['home'] . Create new postgresql directory and give it correct permission and ownership like old one , copy preserving permission data directory inside data directory (data/data), *cluster.sh , and .profile .

Do gitlab-ctl reconfigure , and issue is solved.

Additional issue

Form some unknown reason after changing configuration for Postgresql , gitlab-ctl reconfigure script try to create user and group gitlab-psql even that exist on system which cause to error , solution is to delete entry in /etc/passwd and /etc/group (write down uid and gid), re run reconfigure and ten correct uid and gid.