2018-10-26: Registry authentication and HTTPS auth for git operations are failing

We have identified an issue which is causing all Registry and Git HTTPS authentication for users using private access tokens - including all 2FA users - to fail. Apologies for any inconvenience. We are urgently working on a fix. Please switch to Git SSH or password authentication while we work on a resolution.

Also reported https://gitlab.com/gitlab-com/support-forum/issues/4066

@filipa, @smcgivern and I have all experienced this too: see https://gitlab.slack.com/archives/C02PF508L/p1540544529013000

Confidential working doc for this incident: https://docs.google.com/document/d/1DDDvFucJHG1yNsBfkCKcYphJoalfKeZgMyjE4e0uowQ/edit


Log

  • 09h00Z: Sporadic reports of Git HTTPS failures
  • 11h30Z: Incident declared when it becomes clear that this is a wider issue
  • 12h00Z: Now apparent that this is a security patch issue that went out last night. The change was related to this issue (confidential) https://gitlab.com/gitlab-org/gitlab-ce/issues/51113
  • 12h20Z: Plan of action:
    • The security patch had an irreversible data migration, so we cannot roll the release back without loosing data.
    • Development teams are working on a fix. The fix is expected to be pretty simple.
    • Oncall will handle the patch

Questions

  1. Neither git https authentication endpoints, nor registry JWT token exchange endpoints show a major increase in 401 or 403 errors. This had a negative effect on the speed at which the problem was diagnosed. Why did we not see a spike? Related: why is the base 401/403 rate for these endpoints so high?

  2. Was the risk of rolling out an irreversible data migration well understood by all stakeholders?

  3. Why was this not picked up by GitLab-QA? Does GitLab-QA currently support testing of git https with private access tokens?

  4. Security patch releases carry a higher risk than normal releases, since there is less opportunity to test the release. How can we improve this?

What went well

  1. Engineering teams rallied very quickly when assistance was requested. The support provided was first class 👍
Edited Aug 03, 2020 by 🤖 GitLab Bot 🤖
Assignee Loading
Time tracking Loading