You need to sign in or sign up before continuing.
FY22-Q1 Infrastructure Department OKRs
Objective (IACV): Continually improving spend efficiency for GitLab.com (66%)
- Key Result: GitLab Private
- Key Result: Continue Gitaly Cluster rollout within forecasted cost model
- Key Result: Improve price/performance ratio of one of our service by 10%
Objective (Product): SaaS Reliability Improvements (30%)
This quarter we are going to make key improvements to tech debt which has unfortunately accumulated during the last few quarters.
- Key Result: OS upgrade - accomplish migration from 16.04 for DB and Gitaly hosts
- Key Result: PG12 Enablement & DB Sharding
- Key Result: Platform scaling by tackling application and deployment infrastructure scaling
Objective (Team): Improve Infrastructure influence and accessibility (90%)
- Key Result: Career Development Conversations
- Key Result: Culture Amp survey follow up and action plan
- Key Result: Q1 Hiring Target
[Note: KR which are bold roll through to EVP Eng KRs 10335]
cachebust 12345
Retrospective
Good
- Given uncertainty at the start of Q1, we scoped the GitLab Private work to reflect areas where we knew we could contribute. We didn't make assumptions about things that didn't exist yet like staffing guesstimates and key decisions on what this offering was going to be. This allowed us to focus on what we knew we could do and we did achieve that.
- Focusing on tech debt, and especially the OS Upgrade work was useful even though other priorities ended up with low completion of the actual upgrades. It did however result in timely decisions to implement Ubuntu Advantage coverage so that we we're ever at risk of not receiving critical security patches.
- While we emphasized reliability and tech debt in our Q1 plan, we still targeted smaller efforts for infrafin as well as Team related efforts. This was successful. While we need to continue focusing on reliability, tech debt, and scaling work, we don't want spend efficiency and Team improvement efforts to fall to zero and we'll be following this strategy to still get some limited progress in these areas going forward.
- Had a very successful initial engagement in adopting error budget dashboards within various stage teams. gitlab-com/gl-infra/mstaff#39 (closed)
- While we didn't reach full completion, we made significant progress towards rollback capability, leading to further effort in Q2 (including test this week) gitlab-com/gl-infra/mstaff#40 (closed)
- We engaged with the whole team on additional feedback from the Culture Amp survey through some issue collaboration, but also discussions in every team's meetings as well as through 1:1s. We also created a roadmap for action for the year. https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/12482
- We hired 5 great new team members.
Bad
- We ended up blocking ourselves on critical priority work because of other priority work, such as:
- OS work vs the DB upgrade
- high levels of effort for Abuse vs. meeting incident management expectations
- We didn't arrive at a key goal for the database sharding work (signed off architecture), primarily because of the tactical work to improve the current database situation.
- We continue to have Ubuntu 16.04 OS systems, falling farther behind on staying up to date and compliant.
- We continue to struggle with hiring for DBREs, and to a lesser degree with SRE Managers.
Continue
- Focus on reliability, tech debt, and scaling. These need to remain the bulk of our work for the next few quarters.
- Ensure we don't ignore spend efficiency and continuing to work towards better Team environment. Our infrafin WIP=1 focus effort should continue and we'll stick to the Career Dev and Culture Amp efforts, but defend against scope creep for these.
Try
- Additional efforts in cross-department initiatives such as the UNITED having efforts like the Scalability team error budget effort lead into the Q2 Verify team Error Budget work
Edited by Steve Loyd