Redis cache read error prevents CAS users from remaining signed in
Problem
CAS users are signed out every handful of requests due to token validation incorrectly failing to retrieve the token. This results it GitLab access being completely disrupted as users have to sign in every few page navigations.
Steps to reproduce
- Update an omnibus installation to 12.6
- Configure CAS authentication
- Users are frequently signed out, sometimes immediately after signing in
Cause
On every request we verify that the user's token hasn't expired in ApplicationController#validate_user_service_ticket!
, but Rails.cache
frequently fails to retrieve the token and returns nil
instead.
This happens silently because Rails.cache
rescues from a Redis::ConnectionError (Connection lost (EPIPE))
error, and succeeds on future attempts. Before 12.6 the nil
read may have been happening elsewhere instead.
Sequence of events
- Unicorn forks a process
- An unrelated request attempts to use a Redis socket previously used in the parent process. This results in
Redis::InheritedError (Tried to use a connection from a child process without reconnecting. You need to reconnect to Redis after forking or set :inherit_socket to true.)
, which is silently caught. - CAS token verification tries to use that same Redis connection resulting in
Redis::ConnectionError (Connection lost (EPIPE))
, which is caught silently resulting in anil
cache read. - Future calls to
Rails.cache
succeed until unicorn forks again or a different process triggersRedis::InheritedError
by trying to use the same connection. Other connections in the pool could also be in a broken state, resulting in more frequent failures.
Before 12.6 I suspect another unrelated cache access was being made between (2.)
and (3.)
, resulting in the nil
read happening elsewhere. If so, a fix for CAS token verification could result in the nil
read re-spawning elsewhere.
Temporary workarounds
- Duplicating the
Rails.cache.read
lookup inGitlab::Auth::OAuth::Session.valid?
- Adding
reconnect_attempts: 1
to/var/opt/gitlab/gitlab-rails/etc/resque.yml
See #121670 (comment 268402475)
Original description
We are running a GitLab CE Omnibus installation and just upgraded from 12.4.0-ce to 12.6-0-ce. Following the upgrade, users cannot stay logged into GitLab for more than a few minutes. It keeps randomly logging users out with no discernible pattern. Often (but not always) the failure will occur while viewing an issue, with an error such as "Something went wrong while fetching comments. Please try again." appearing at the top of the page.
We have tried multiple times to restart GitLab, but this has not resolved the issue.
Steps to reproduce
No clear pattern to reproduce. It occurs completely at random while navigating around GitLab.
What is the current bug behavior?
Users are being logged out randomly and frequently.
What is the expected correct behavior?
Users should not be logged out.
Relevant logs and/or screenshots
==> /var/log/gitlab/gitlab-rails/production.log <==
Started GET "/rowanonline/sites/cascade/today/issues/10/realtime_changes" for 10.240.192.3 at 2019-12-23 17:20:33 -0500
Started GET "/rowanonline/sites/cascade/today/issues/10" for 10.240.192.3 at 2019-12-23 17:20:34 -0500
Processing by Projects::IssuesController#show as HTML
Parameters: {"namespace_id"=>"rowanonline/sites/cascade", "project_id"=>"today", "id"=>"10"}
Redirected to https://####/users/sign_in
Filter chain halted as :project rendered or redirected
Completed 302 Found in 33ms (ActiveRecord: 0.1ms | Elasticsearch: 0.0ms)
Started GET "/users/sign_in" for 10.240.192.3 at 2019-12-23 17:20:34 -0500
Processing by SessionsController#new as HTML
Note: Relevant domains replaced with "####" as we'd prefer to keep the URL private.
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
System information System: CentOS 7.6.1810 Current User: git Using RVM: no Ruby Version: 2.6.3p62 Gem Version: 2.7.9 Bundler Version:1.17.3 Rake Version: 12.3.3 Redis Version: 3.2.12 Git Version: 2.24.1 Sidekiq Version:5.2.7 Go Version: unknown GitLab information Version: 12.6.0-ee Revision: fc376e40baf Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 10.9 URL: https://#### HTTP Clone URL: https://####/some-group/some-project.git SSH Clone URL: git@####:some-group/some-project.git Using LDAP: no Using Omniauth: yes Omniauth Providers: cas3 GitLab Shell Version: 10.3.0 Repository storage paths: - default: /var/opt/gitlab/git-data/repositories GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell Git: /opt/gitlab/embedded/bin/git
Note: Relevant domains replaced with "####" as we'd prefer to keep the URL private.
Results of GitLab application Check
Expand for output related to the GitLab application check
Checking GitLab subtasks ... Checking GitLab Shell ... GitLab Shell: ... GitLab Shell version >= 10.3.0 ? ... OK (10.3.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Internal API available: OK Redis available via internal API: OK gitlab-shell self-check successful Checking GitLab Shell ... Finished Checking Gitaly ... Gitaly: ... default ... OK Checking Gitaly ... Finished Checking Sidekiq ... Sidekiq: ... Running? ... yes Number of Sidekiq processes ... 1 Checking Sidekiq ... Finished Checking Incoming Email ... Incoming Email: ... Reply by email is disabled in config/gitlab.yml Checking Incoming Email ... Finished Checking LDAP ... LDAP: ... LDAP is disabled in config/gitlab.yml Checking LDAP ... Finished Checking GitLab App ... Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... 5/4 ... yes 5/8 ... yes 5/10 ... yes 5/11 ... yes 5/13 ... yes 5/14 ... yes 5/20 ... yes 41/21 ... yes 5/22 ... yes 5/31 ... yes 42/37 ... yes 5/38 ... yes 10/39 ... yes 14/41 ... yes 14/42 ... yes 14/43 ... yes 10/44 ... yes 42/45 ... yes 41/46 ... yes 10/53 ... yes 10/54 ... yes 10/55 ... yes 10/56 ... yes 5/57 ... yes 7/63 ... yes 7/64 ... yes 9/65 ... yes 42/67 ... yes 10/68 ... yes 9/71 ... yes 9/72 ... yes 9/73 ... yes 48/75 ... yes 48/76 ... yes 48/77 ... yes 39/78 ... yes 5/81 ... yes 48/84 ... yes 16/86 ... yes 6/88 ... yes 9/89 ... yes 9/90 ... yes 9/91 ... yes 10/93 ... yes 5/94 ... yes 5/95 ... yes 9/96 ... yes 9/97 ... yes 9/98 ... yes 42/99 ... yes 9/100 ... yes 9/101 ... yes 5/105 ... yes 5/106 ... yes 5/107 ... yes 6/108 ... yes 10/109 ... yes 42/110 ... yes 6/111 ... yes 16/112 ... yes 10/113 ... yes 6/114 ... yes 41/115 ... yes 10/116 ... yes 42/117 ... yes 10/118 ... yes 7/119 ... yes 10/120 ... yes 41/121 ... yes 7/122 ... yes 10/123 ... yes 42/124 ... yes 42/126 ... yes 9/127 ... yes 9/128 ... yes 6/130 ... yes 9/131 ... yes 43/132 ... yes 8/133 ... yes 5/134 ... yes 16/135 ... yes 5/136 ... yes 5/137 ... yes 42/140 ... yes 59/143 ... yes 10/144 ... yes 5/145 ... yes 43/149 ... yes 10/151 ... yes 5/154 ... yes 10/156 ... yes 9/157 ... yes 9/158 ... yes 42/159 ... yes 6/160 ... yes 10/161 ... yes 7/163 ... yes 41/164 ... yes 6/165 ... yes 42/166 ... yes 5/167 ... yes 7/174 ... yes 5/177 ... yes 5/178 ... yes 41/193 ... yes 10/194 ... yes 9/195 ... yes 5/196 ... yes 10/197 ... yes 10/198 ... yes 9/199 ... yes 6/200 ... yes 9/201 ... yes 42/203 ... yes 6/210 ... yes 5/211 ... yes 27/215 ... yes 6/216 ... yes 27/217 ... yes 58/218 ... yes 58/219 ... yes 6/220 ... yes 4/221 ... yes 4/222 ... yes 5/224 ... yes 59/225 ... yes 10/226 ... yes 6/228 ... yes 6/230 ... yes 6/232 ... yes 60/233 ... yes 35/237 ... yes 35/238 ... yes 48/240 ... yes Redis version >= 2.8.0? ... yes Ruby version >= 2.5.3 ? ... yes (2.6.3) Git version >= 2.22.0 ? ... yes (2.24.1) Git user has default SSH configuration? ... yes Active users: ... 12 Is authorized keys file accessible? ... yes Checking GitLab App ... Finished Checking GitLab subtasks ... Finished