Move to event stream processing instead of general DB polling
In line with our recent conversations about realtime, I think we need to consider changing the way we are processing data transformations and flow into a event sourcing model
The general idea would be that we have events and subscribers to these events, so for example when we push a commit we would create an event and deliver the payload to all the subscribed listeners. This would drop our usage of git access to a single call (when reading the commits) and then we could have subscribed processors like: parse the message for issues to close, index in elastic search, update merge requests in real time, push a build to the CI pipeline, evict the cache in gitaly, etc.
There are many other cases where this pattern makes a lot of sense, for example in the case of updating the title of an issue in real time to all the users that are staring at it, or loading new comments as they are being written, all this without impacting the database and scaling consumers without requiring to scale the underlaying database.
Following this pattern, instead of having multiple users hitting multiple parts of the application by polling we would invert the data flow and we would be pushing this data into this event stream which in turn would be pushed to clients who could be using long polling (or websockets) to trigger executions immediately using the shipped payload.
This will also allow us to detach the event processing from the database model as we would be pushing the payload in the event, simplifying introducing changes in the application and reducing the complexity of model migrations as they could be happening sooner as we can keep the event model compatible.
We could just use Redis to build this, but we would need a specific piece of infrastructure to manage the long-polling/websocket part to make it scale. As a result we would remove all this querying to the database to get only what's necessary and we could scale orders of magnitude more clients as they would not be hitting deeply into the application at all.
Quoting Martin Fowler:
Event Sourcing also raises some possibilities for your overall architecture, particularly if you are looking for something that is very scalable. There is a fair amount of interest in 'event-driven architecture' these days. This term covers a fair range of ideas, but most of centers around systems communicating through event messages. Such systems can operate in a very loosely coupled parallel style which provides excellent horizontal scalability and resilience to systems failure.
cc/ @yorickpeterse @DouweM @smcgivern @stanhu @andrewn
Note: this reasoning comes from learning about kafka streams and talking with @andrewn about scaling gitter in the summit.