2018-10-26: Registry authentication and HTTPS auth for git operations are failing
We have identified an issue which is causing all Registry and Git HTTPS authentication for users using private access tokens - including all 2FA users - to fail. Apologies for any inconvenience. We are urgently working on a fix. Please switch to Git SSH or password authentication while we work on a resolution.
Also reported https://gitlab.com/gitlab-com/support-forum/issues/4066
@filipa, @smcgivern and I have all experienced this too: see https://gitlab.slack.com/archives/C02PF508L/p1540544529013000
Confidential working doc for this incident: https://docs.google.com/document/d/1DDDvFucJHG1yNsBfkCKcYphJoalfKeZgMyjE4e0uowQ/edit
Log
- 09h00Z: Sporadic reports of Git HTTPS failures
- 11h30Z: Incident declared when it becomes clear that this is a wider issue
- 12h00Z: Now apparent that this is a security patch issue that went out last night. The change was related to this issue (confidential) https://gitlab.com/gitlab-org/gitlab-ce/issues/51113
-
12h20Z: Plan of action:
- The security patch had an irreversible data migration, so we cannot roll the release back without loosing data.
- Development teams are working on a fix. The fix is expected to be pretty simple.
- Oncall will handle the patch
Questions
-
Neither git https authentication endpoints, nor registry JWT token exchange endpoints show a major increase in 401 or 403 errors. This had a negative effect on the speed at which the problem was diagnosed. Why did we not see a spike? Related: why is the base 401/403 rate for these endpoints so high?
-
Was the risk of rolling out an irreversible data migration well understood by all stakeholders?
-
Why was this not picked up by GitLab-QA? Does GitLab-QA currently support testing of git https with private access tokens?
-
Security patch releases carry a higher risk than normal releases, since there is less opportunity to test the release. How can we improve this?
What went well
- Engineering teams rallied very quickly when assistance was requested. The support provided was first class
👍