2022-01-20 Keys generated by ssh-rsa algorithm are not supported by gitlab-sshd
Incident DRI
Current Status
- This was a Near Miss
- Remediation MR in review
Retrospective Summary
During the final rollout of gitlab-sshd
to Production, an issue was discovered that was created by the OpenSSH 8.8 update. OpenSSH introduced a breaking change in September 2021 with the release of OpenSSH 8.8, by not supporting ssh-rsa
by default. We did not discover this issue during in our testing.
If the issue took place, ssh-rsa
users would have to work around the issue by specifying PubkeyAcceptedAlgorithms: ssh-rsa
in their local OpenSSH client configs.
At the time of discovery, the release was in Staging. Had it been released to Production, users with ssh-rsa
keys would not have been able to perform git
operations over SSH. We do not currently know the size of this user population.
The rollout of gitlab-ssd
has been through a Production Readiness Review due to both the blast radius of the change and the complexities in the actual rollout.
The impact would have been only some of ssh-rsa
users, as the implementation plan was to gradually increase the traffic directed to gitlab-sshd
while monitoring closely for any issues.
Timeline
All times UTC+1.
2022-01-20
-
16:42
- additional security review requested by @igor.drozdov prior to release -
17:12
- @joernchen identifiesssh-rsa
was (hard) deprecated in OpenSSH client
2022-01-21
-
08:22
- issue created to track fix gitlab-org/gitlab-shell#543 (closed) -
11:26
- update posted in the main epic
2022-01-27
-
19:18
- MR created to repair issue
Takeaways
- While users would have been impacted, it would be a smaller subset using
ssh-rsa
, and the rollout procedure already led for a gradual rollout. Users would have been impacted, but not all GitLab SaaS users. The actual impact is unknown. - GitHub have already dropped ssh-rsa, a process that has taken ~15 months.
- Issue was not initially discovered either in automated testing or manual tests against Staging. As OpenSSH was only recently updated, client PCs and presumably CI builds were using OpenSSH 8.7 which did not pose a problem. The client OpenSSH install was not a part of modified software.
- The change that was listed as a deprecation on OpenSSH was actually a breaking change for us.
- The security review had been completed. Having a second review was a lucky break.
Corrective Actions
Corrective actions should be put here as soon as an incident is mitigated, ensure that all corrective actions mentioned in the notes below are included.
Note: In some cases we need to redact information from public view. We only do this in a limited number of documented cases. This might include the summary, timeline or any other bits of information, laid out in out handbook page. Any of this confidential data will be in a linked issue, only visible internally. By default, all information we can share, will be public, in accordance to our transparency value.
Click to expand or collapse the Incident Review section.
Incident Review
-
Ensure that the exec summary is completed at the top of the incident issue, the timeline is updated and relevant graphs are included in the summary -
If there are any corrective action items mentioned in the notes on the incident, ensure they are listed in the "Corrective Action" section -
Fill out relevant sections below or link to the meeting review notes that cover these topics
Customer Impact
-
Who was impacted by this incident? (i.e. external customers, internal customers)
- GitLab SaaS customers using
ssh-rsa
keys
- GitLab SaaS customers using
-
What was the customer experience during the incident? (i.e. preventing them from doing X, incorrect display of Y, ...)
- Prevention of command line
git push
andgit pull
operations
- Prevention of command line
-
How many customers were affected?
- Unknown
-
If a precise customer impact number is unknown, what is the estimated impact (number and ratio of failed requests, amount of traffic drop, ...)?
- Unknown
What were the root causes?
- Breaking change on OpenSSH client, nominated as a deprecation
Incident Response Analysis
-
How was the incident detected?
- Manual review
-
How could detection time be improved?
- ...
-
How was the root cause diagnosed?
- ...
-
How could time to diagnosis be improved?
- ...
-
How did we reach the point where we knew how to mitigate the impact?
- ...
-
How could time to mitigation be improved?
- ...
-
What went well?
- ...
Post Incident Analysis
-
Did we have other events in the past with the same root cause?
- ...
-
Do we have existing backlog items that would've prevented or greatly reduced the impact of this incident?
- ...
-
Was this incident triggered by a change (deployment of code or change to infrastructure)? If yes, link the issue.
- ...
What went well?
- ...
Guidelines
Resources
- If the Situation Zoom room was utilised, recording will be automatically uploaded to Incident room Google Drive folder (private)