Add a dependency on a dedicated event bus / queue / log
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Description
Despite our best efforts, GitLab is a system of interacting software components. Including a runner or HA, these components are spread across multiple servers. Including Geo, these components are spread across datacentres.
They use a variety of means to communicate, including:
- Ad-hoc HTTP RPCs
- Files
- GRPC
- Postgresql database
- Redis database
Some of the communication is "event-log-like", but implemented on top of a not-event-log store. Prominent examples:
- CI API long polling
- Geo event log
- "Realtime" HTTP API for the frontend
We're also looking to add more examples, as we increasingly lose access to files as a result of the cloud native migration and want to add more ambitious features to GitLab:
- GitLab Pages
- CI live traces (https://gitlab.com/gitlab-org/gitlab-ee/issues/4607#note_64734243)
- Realtime issue editing (https://gitlab.com/gitlab-org/gitlab-ce/issues/44654)
- Logging service for applications
Instead, these features are implemented using less-suitable primitives, because they're what we have available at present. In particular, we're looking at staging trace chunks to the database via redis because we don't have a better way to aggregate it at present.
Proposal
Without blocking any other work, I suggest we evaluate a few solutions on the basis of:
- How good a job they'd do at replicating the above functionality
- Whether we can package them sensibly in omnibus (sorry, kafka)
If we identify a reasonable candidate, we should do the necessary work to integrate it into our distribution and pick one of the above to reimplement in event-log terms to sit alongside it (I suggest CI API long polling). Then, next time we have event-log-like data, we will have an existing, working system to use immediately.
Benefits are mostly long-run here. It will increase the velocity of new features that rely on data with this structure. Features ported to this new dependency will have improved reliability and performance characteristics, while the load placed on the database and redis will be reduced, making maintenance of those components on GitLab.com easier.