Deploy tokens issues during 11.2.0 RC1 deployment
README FIRST
This issue is created to recognise the causes that led to the described problems. No individual or group can or will need to take responsibility for the problem. We are all working together on creating a way to not repeat the same mistakes.
Circumstance description
On 2018-08-02 during 11.2.0.RC1 deploy, deploy tokens stopped working for all users on GitLab.com. Increased number of errors shown in monitoring has only been observed by person on call, deploy was still ongoing.
Production incident is described in https://gitlab.com/gitlab-com/production/issues/385
As part of the production incident investigation, two errors were shown in Sentry:
- https://gitlab.com/gitlab-org/gitlab-ce/issues/49904
- https://gitlab.com/gitlab-org/gitlab-ee/issues/7080
Impact
Anyone using deploy tokens without an expiry date set was unable to clone a repository (manually or through CI) or pull a registry image on GitLab.com.
Immediate corrective actions
Post deploy patches were applied:
- https://dev.gitlab.org/gitlab/post-deployment-patches/merge_requests/88
- https://dev.gitlab.org/gitlab/post-deployment-patches/merge_requests/89
Application fixes
Fixes were introduced with:
- https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/20992
- https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/20993
Origin of the problem
It appears that the issue was caused by two unrelated changes.
Use Deploy Tokens to clone LFS repositories
Change introduced in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/20729 to resolve https://gitlab.com/gitlab-org/gitlab-ce/issues/46869 .
Repositories that contain LFS objects could not be cloned using Deploy Tokens.
Users table getting updated frequently
Change introduced in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/20597 to resolve https://gitlab.com/gitlab-org/gitlab-ce/issues/43312 .
User activity worker was running at schedule and was updating all records at the same time. At the time of reported problem, this was updating more than 100k rows in the users table just to show last user activity.
Corrective actions going forward
- GitLab QA to add deploy tokens coverage gitlab-org/gitlab-qa#308 (moved)