Verify Scale, Quality and Usability Optimization
Overview
Verify:Continuous Integration was split into two teams in September 2020, Pipeline Authoring and Continous Integration, although the backlog of technical debt, bugs, and UI polish were not equally split leaving the burden on the Verify:CI team even though 6 engineers were removed from the team.
As a summary while only 50% of the team are supporting the throughput and direction:
- 75% of bugs remain in Verify:CI
- 90% of Tech Debt remain in Verify:CI
Proposal
Increase the Verify team in the following priority order:
- Add 1 Engineering Manager (to reduce the span of control for the current split BEM) FY22 Q3
- Add 1 Quality Teammate to Runner team in FY22 Q3 to mitigate the risk identified by the Ops Quality Working Group
- Add 2 backend engineer by end of FY22 Q3
- Add 1 front end engineer by end of FY22 Q3
- Add 1 backend engineer by end of FY22 Q3
Impact
By adding team members to Verify, we would be able to adequately support the investment in bugs, technical debt, and scale without having to only prioritize this in the 1H of FY22.
Deliver on performance
Currently, we are unable to adequately address the SCM > Verify Adoption blockers that have been identified in this Opportunity Canvas, some of the challenges identified include:
- Performance or usability of features
- Fit and finish or completeness of the MVCs
- Filtering or debugging capabilities at scale
In order for us to adequately address the first two and begin meeting the identified IACV impact we need to address the bugs first, and the only way to do this sustainably with our current team is to dedicate them wholly to the effort. With additional headcount, we can begin investing in the 3rd item and 2H vision for CI, which will allow us to compete against GitHub Actions, Circle CI Orbs and other complex modeling tools like Octopus Deploy in CD. The more use cases we begin to include the greater MAU and IACV, as well as SpO we begin to reach.
We are also struggling to support the Runner Rails console on the Runner team which is dominated by Go engineers. The CI is responsible for supporting the Rails scope and will own the enablement of the Fullstack team mate that will be building the Premium offering in Runner. Without this additional headcount, the team will be trading off feature work in Pipeline Efficiency and Analysis to support the on-ramp. With the added headcount these teammates would be able to support the new feature work while the staff and senior engineers work to enable Runner on Rails - allowing us to add tier value across all groups in Verify.
Cross-stage dependencies
There is also an ancillary and potential unblocking benefit to other teams that make increase IACV in cross-stage opportunities where development teams cited they were blocked by the Continuous Integration team. A great example of this could be the Create stage and MR usability efforts being dependent on bugs in the pipeline. With this additional support, we can add reviewers and enablers for these other teams.
Managing the Scaling of GitLab
Lastly, we have instituted a Rapid Action for the Verify Scaling effort. We project the next 1 Billion pipelines will be run in the next 9 months. Currently, we trigger roughly 24 customer builds per second, which could grow exponentially. Scaling a system at this rate to support this load is not an easy feat. We will need to invest beyond Rapid Action to support the DB load. This new team will be instrumental in our effort to support the scale required of GitLab.
For context, we create 60 million builds per month on GitLab.com alone:
We anticipate hitting 2B late this year (though it is possible this will accelerate)