Add a dependency on a dedicated event bus / queue / log

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

  • Close this issue

Description

Despite our best efforts, GitLab is a system of interacting software components. Including a runner or HA, these components are spread across multiple servers. Including Geo, these components are spread across datacentres.

They use a variety of means to communicate, including:

  • Ad-hoc HTTP RPCs
  • Files
  • GRPC
  • Postgresql database
  • Redis database

Some of the communication is "event-log-like", but implemented on top of a not-event-log store. Prominent examples:

  • CI API long polling
  • Geo event log
  • "Realtime" HTTP API for the frontend

We're also looking to add more examples, as we increasingly lose access to files as a result of the cloud native migration and want to add more ambitious features to GitLab:

  • GitLab Pages
  • CI live traces (https://gitlab.com/gitlab-org/gitlab-ee/issues/4607#note_64734243)
  • Realtime issue editing (https://gitlab.com/gitlab-org/gitlab-ce/issues/44654)
  • Logging service for applications

Instead, these features are implemented using less-suitable primitives, because they're what we have available at present. In particular, we're looking at staging trace chunks to the database via redis because we don't have a better way to aggregate it at present.

Proposal

Without blocking any other work, I suggest we evaluate a few solutions on the basis of:

  • How good a job they'd do at replicating the above functionality
  • Whether we can package them sensibly in omnibus (sorry, kafka)

If we identify a reasonable candidate, we should do the necessary work to integrate it into our distribution and pick one of the above to reimplement in event-log terms to sit alongside it (I suggest CI API long polling). Then, next time we have event-log-like data, we will have an existing, working system to use immediately.

Benefits are mostly long-run here. It will increase the velocity of new features that rely on data with this structure. Features ported to this new dependency will have improved reliability and performance characteristics, while the load placed on the database and redis will be reduced, making maintenance of those components on GitLab.com easier.

Links / references

Edited Sep 28, 2025 by 🤖 GitLab Bot 🤖
Assignee Loading
Time tracking Loading