Skip to content

Redis cache read error prevents CAS users from remaining signed in

Problem

CAS users are signed out every handful of requests due to token validation incorrectly failing to retrieve the token. This results it GitLab access being completely disrupted as users have to sign in every few page navigations.

Steps to reproduce

  1. Update an omnibus installation to 12.6
  2. Configure CAS authentication
  3. Users are frequently signed out, sometimes immediately after signing in

Cause

On every request we verify that the user's token hasn't expired in ApplicationController#validate_user_service_ticket!, but Rails.cache frequently fails to retrieve the token and returns nil instead.

This happens silently because Rails.cache rescues from a Redis::ConnectionError (Connection lost (EPIPE)) error, and succeeds on future attempts. Before 12.6 the nil read may have been happening elsewhere instead.

Sequence of events

  1. Unicorn forks a process
  2. An unrelated request attempts to use a Redis socket previously used in the parent process. This results in Redis::InheritedError (Tried to use a connection from a child process without reconnecting. You need to reconnect to Redis after forking or set :inherit_socket to true.), which is silently caught.
  3. CAS token verification tries to use that same Redis connection resulting in Redis::ConnectionError (Connection lost (EPIPE)), which is caught silently resulting in a nil cache read.
  4. Future calls to Rails.cache succeed until unicorn forks again or a different process triggers Redis::InheritedError by trying to use the same connection. Other connections in the pool could also be in a broken state, resulting in more frequent failures.

Before 12.6 I suspect another unrelated cache access was being made between (2.) and (3.), resulting in the nil read happening elsewhere. If so, a fix for CAS token verification could result in the nil read re-spawning elsewhere.

Temporary workarounds

  • Duplicating the Rails.cache.read lookup in Gitlab::Auth::OAuth::Session.valid?
  • Adding reconnect_attempts: 1 to /var/opt/gitlab/gitlab-rails/etc/resque.yml

See #121670 (comment 268402475)

Original description

We are running a GitLab CE Omnibus installation and just upgraded from 12.4.0-ce to 12.6-0-ce. Following the upgrade, users cannot stay logged into GitLab for more than a few minutes. It keeps randomly logging users out with no discernible pattern. Often (but not always) the failure will occur while viewing an issue, with an error such as "Something went wrong while fetching comments. Please try again." appearing at the top of the page.

We have tried multiple times to restart GitLab, but this has not resolved the issue.

Steps to reproduce

No clear pattern to reproduce. It occurs completely at random while navigating around GitLab.

What is the current bug behavior?

Users are being logged out randomly and frequently.

What is the expected correct behavior?

Users should not be logged out.

Relevant logs and/or screenshots

==> /var/log/gitlab/gitlab-rails/production.log <==
Started GET "/rowanonline/sites/cascade/today/issues/10/realtime_changes" for 10.240.192.3 at 2019-12-23 17:20:33 -0500
Started GET "/rowanonline/sites/cascade/today/issues/10" for 10.240.192.3 at 2019-12-23 17:20:34 -0500
Processing by Projects::IssuesController#show as HTML
  Parameters: {"namespace_id"=>"rowanonline/sites/cascade", "project_id"=>"today", "id"=>"10"}
Redirected to https://####/users/sign_in
Filter chain halted as :project rendered or redirected
Completed 302 Found in 33ms (ActiveRecord: 0.1ms | Elasticsearch: 0.0ms)
Started GET "/users/sign_in" for 10.240.192.3 at 2019-12-23 17:20:34 -0500
Processing by SessionsController#new as HTML

Note: Relevant domains replaced with "####" as we'd prefer to keep the URL private.

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info
System information
System:		CentOS 7.6.1810
Current User:	git
Using RVM:	no
Ruby Version:	2.6.3p62
Gem Version:	2.7.9
Bundler Version:1.17.3
Rake Version:	12.3.3
Redis Version:	3.2.12
Git Version:	2.24.1
Sidekiq Version:5.2.7
Go Version:	unknown

GitLab information
Version:	12.6.0-ee
Revision:	fc376e40baf
Directory:	/opt/gitlab/embedded/service/gitlab-rails
DB Adapter:	PostgreSQL
DB Version:	10.9
URL:		https://####
HTTP Clone URL:	https://####/some-group/some-project.git
SSH Clone URL:	git@####:some-group/some-project.git
Using LDAP:	no
Using Omniauth:	yes
Omniauth Providers: cas3

GitLab Shell
Version:	10.3.0
Repository storage paths:
- default: 	/var/opt/gitlab/git-data/repositories
GitLab Shell path:		/opt/gitlab/embedded/service/gitlab-shell
Git:		/opt/gitlab/embedded/bin/git

Note: Relevant domains replaced with "####" as we'd prefer to keep the URL private.

Results of GitLab application Check

Expand for output related to the GitLab application check
Checking GitLab subtasks ...

Checking GitLab Shell ...

GitLab Shell: ... GitLab Shell version >= 10.3.0 ? ... OK (10.3.0)
Running /opt/gitlab/embedded/service/gitlab-shell/bin/check
Internal API available: OK
Redis available via internal API: OK
gitlab-shell self-check successful

Checking GitLab Shell ... Finished

Checking Gitaly ...

Gitaly: ... default ... OK

Checking Gitaly ... Finished

Checking Sidekiq ...

Sidekiq: ... Running? ... yes
Number of Sidekiq processes ... 1

Checking Sidekiq ... Finished

Checking Incoming Email ...

Incoming Email: ... Reply by email is disabled in config/gitlab.yml

Checking Incoming Email ... Finished

Checking LDAP ...

LDAP: ... LDAP is disabled in config/gitlab.yml

Checking LDAP ... Finished

Checking GitLab App ...

Git configured correctly? ... yes
Database config exists? ... yes
All migrations up? ... yes
Database contains orphaned GroupMembers? ... no
GitLab config exists? ... yes
GitLab config up to date? ... yes
Log directory writable? ... yes
Tmp directory writable? ... yes
Uploads directory exists? ... yes
Uploads directory has correct permissions? ... yes
Uploads directory tmp has correct permissions? ... yes
Init script exists? ... skipped (omnibus-gitlab has no init script)
Init script up-to-date? ... skipped (omnibus-gitlab has no init script)
Projects have namespace: ... 
5/4 ... yes
5/8 ... yes
5/10 ... yes
5/11 ... yes
5/13 ... yes
5/14 ... yes
5/20 ... yes
41/21 ... yes
5/22 ... yes
5/31 ... yes
42/37 ... yes
5/38 ... yes
10/39 ... yes
14/41 ... yes
14/42 ... yes
14/43 ... yes
10/44 ... yes
42/45 ... yes
41/46 ... yes
10/53 ... yes
10/54 ... yes
10/55 ... yes
10/56 ... yes
5/57 ... yes
7/63 ... yes
7/64 ... yes
9/65 ... yes
42/67 ... yes
10/68 ... yes
9/71 ... yes
9/72 ... yes
9/73 ... yes
48/75 ... yes
48/76 ... yes
48/77 ... yes
39/78 ... yes
5/81 ... yes
48/84 ... yes
16/86 ... yes
6/88 ... yes
9/89 ... yes
9/90 ... yes
9/91 ... yes
10/93 ... yes
5/94 ... yes
5/95 ... yes
9/96 ... yes
9/97 ... yes
9/98 ... yes
42/99 ... yes
9/100 ... yes
9/101 ... yes
5/105 ... yes
5/106 ... yes
5/107 ... yes
6/108 ... yes
10/109 ... yes
42/110 ... yes
6/111 ... yes
16/112 ... yes
10/113 ... yes
6/114 ... yes
41/115 ... yes
10/116 ... yes
42/117 ... yes
10/118 ... yes
7/119 ... yes
10/120 ... yes
41/121 ... yes
7/122 ... yes
10/123 ... yes
42/124 ... yes
42/126 ... yes
9/127 ... yes
9/128 ... yes
6/130 ... yes
9/131 ... yes
43/132 ... yes
8/133 ... yes
5/134 ... yes
16/135 ... yes
5/136 ... yes
5/137 ... yes
42/140 ... yes
59/143 ... yes
10/144 ... yes
5/145 ... yes
43/149 ... yes
10/151 ... yes
5/154 ... yes
10/156 ... yes
9/157 ... yes
9/158 ... yes
42/159 ... yes
6/160 ... yes
10/161 ... yes
7/163 ... yes
41/164 ... yes
6/165 ... yes
42/166 ... yes
5/167 ... yes
7/174 ... yes
5/177 ... yes
5/178 ... yes
41/193 ... yes
10/194 ... yes
9/195 ... yes
5/196 ... yes
10/197 ... yes
10/198 ... yes
9/199 ... yes
6/200 ... yes
9/201 ... yes
42/203 ... yes
6/210 ... yes
5/211 ... yes
27/215 ... yes
6/216 ... yes
27/217 ... yes
58/218 ... yes
58/219 ... yes
6/220 ... yes
4/221 ... yes
4/222 ... yes
5/224 ... yes
59/225 ... yes
10/226 ... yes
6/228 ... yes
6/230 ... yes
6/232 ... yes
60/233 ... yes
35/237 ... yes
35/238 ... yes
48/240 ... yes
Redis version >= 2.8.0? ... yes
Ruby version >= 2.5.3 ? ... yes (2.6.3)
Git version >= 2.22.0 ? ... yes (2.24.1)
Git user has default SSH configuration? ... yes
Active users: ... 12
Is authorized keys file accessible? ... yes

Checking GitLab App ... Finished


Checking GitLab subtasks ... Finished
Edited by James Edwards-Jones