Evaluate different solutions for real-time updates

Currently when we do live updates (comments, build status, ...) we use very simple polling. This does not scale well on GitLab.com. And we want to introduce more real-time updates (https://gitlab.com/gitlab-org/gitlab-ce/issues?scope=all&state=opened&utf8=%E2%9C%93&search=real-time) in the future.

We have to come up with a pattern that is performant enough and stick to it across the whole application.

Plan

Create a list of possible solutions with pros and cons
Pick the one that sounds reasonable
Create a PoC and deploy it to GitLab.com behind a feature flag so we can quickly turn it off
Start again with point 2 if point 3 causes troubles

Possible solutions

Websockets

@yorickpeterse:

This would remove the need for polling, but exchange it with long-lived connections. It's unclear if this will be any better. Websockets also requires us to replace Unicorn with e.g. Puma as Unicorn is not suitable for this. We don't want to run an extra process just for websockets as this complicates deployments, managing infrastructure, etc. Puma is something we have looked into the past, but we're not sure yet: !1899 (closed) and #3592 (closed)

Polling

@yorickpeterse:

This is what we currently use, and it doesn't scale. We could merge different polling endpoints/calls into a single one, but this will only work if:

This new endpoint is faster than the sum of the current ones
We can guarantee this endpoint stays fast, even when adding more data

Since item 2 somewhat violates the laws of physics (you can't add something new without it taking more time) I don't see this working out very well.

Workhorse + Websockets

@ayufan:

Maybe we could add a pub/sub system with the help of Redis based on Websockets that would be terminated by Workhorse and updated (notified) by Rails? We currently think about something similar for GitLab Runner, we don't yet use Websockets (only long-polling connections), but this will use pub implemented in Rails, with sub implemented in Workhorse.

long-polling with MessageBus

@adamniedzielski:

See https://github.com/SamSaffron/message_bus

Advantages:

just Ruby and Redis - we don't introduce a new dependency
doesn't require a separate process to run
API is easy to use from the developer perspective
used in Discourse. Some of their customers installations seem to be big http://www.discourse.org/faq/customers/

Disadvantages:

I couldn't find any real numbers about it

Nchan - long-polling, Websockets and EventSource (SSE)

@mardukecz

https://nchan.slact.net/

Maybe it is out of discussion and I mention about it on another issue, but i use for distribute "realtime" message websocket, but with Nchan. It is pub/sub server, which can handle all suggested types of serving technologies. Configurable by channel can be defined set of method which can publish and another subset of techs which can subscribe. It is small program developed as nginx module, but can run standalone. cen use redis to sync with other instances to create cluster. Support dynamic channels, scaling, authetication . Nchan itself dont asnwer evaluation, but enable in one interface use method preferable for each situation. For example, logged users will be served by websocket and others by pulling, users with developer (and higher) rigths can use on pipeline page websocket, others only long pulling, etc...

Contributions wanted

Please suggest possible solutions and describe them in the most detailed way that you can.