Real Time Infrastructure at GitLab
TL;DR
- Real-time updates are table-stakes for GitLab’s mission. Let’s make table-stakes easy.
- Slide deck overview
- I would love your feedback; can you help with the action items below?
Context
GitLab’s Big Hairy Audacious Goal over the next 30 years is to become the most popular collaboration tool for knowledge workers in any industry.
Collaboration platforms are shifting to multiplayer by default: users expect to see changes made by others reflected in their own environment. It is increasingly table-stakes for users to see the latest state without performing a full page reload (ie, Google Docs, Figma).
Challenge
Despite real-time updates becoming table-stakes for user expectations, there is no platform-level solution that supports individual GitLab teams. Each team has to build out their own path towards real-time updates, leading to an inconsistent user experience that is difficult for us to scale and iterate on as a company.
Our current approach to real-time updates (ActionCable in Rails) has known scaling issues. As more teams adopt this approach, we will experience a “tragedy of the commons” in performance degradation and memory utilization.
Proposal
Develop a scalable real-time update framework that will move the needle towards GitLab mission of “everyone can contribute”:
- Individual product teams are empowered to focus on their domain expertise;
- Users will be able to see their latest data without performing a full-page reload
What might this look like?
In the long-term, we might have… a real-time synch engine. Instead of constantly pulling the latest data from GitLab via a full-page reload, GitLab would push changes to users, reinforcing GitLab’s position as their Single Source of Truth. A data synchronization platform unlocks high-leverage opportunities for GitLab to facilitate cross-stage workflows and information.
In the short term, we might have… a green path towards real-time scalability as additional use-cases are implemented. Individual teams would have confidence that the solutions they are building towards can be extended, made more performant, and iterated upon without expensive rewrites.
Where are we now?
Plan initially developed real-time updates for the issue sidebar. They are currently looking at Real-time boards lists (#16020)
Code review is currently rebuilding their backend to use graphQL with the aim of leveraging real-time for various MR widgets:
- Real-time merge widget (&8639 - closed)
- Real-time merge request approval widget (&9316 - closed)
- ⚡️ Real-time merge request updates (&1812)
Verify is currently focussed on increasing pipeline execution speed. Real-time updates are a known user pain-point, but we are recommending users either reload the page or poll our APIs instead. The value of real-time updates would be magnified with faster pipeline execution.
How might we iterate?
The real-time use cases for Plan, Create, and Verify are ordered in both chronological and increasing data-load (ie, Verify has a lot more information to update than Plan) sequence. The Application Performance team could deliver incremental value towards a real-time framework by supporting Plan -> Create -> Verify
in sequential order. The Application Performance team could work with...
-
Plan
: to genericize and harden the foundations of our current websocket infrastructure. -
Create
: to formalize system boundaries and interfaces for a real-time update framework. -
Verify
: to scale our real-time infrastructure as improved pipelines roll-out.
Concerns and Considerations
To ensure a real-time framework remains useful and applicable, we will iterate using the principle of adjacent value opportunities. This is in explicit opposition to the saying, “if you build it, they will come”. As we evolve towards a real-time framework, we will look for opportunities to deliver incremental value. This principle ensures we have tangible milestones to aim for, and mitigates the risk of building a brittle framework.
Action Items
- I would love your feedback. Please review the attached slidedeck
- Can you think of similar / adjacent opportunities we could work towards / unblock?
- As we iteratively build towards real-time by default, what are your key concerns?