Evaluate different solutions for real-time updates
Currently when we do live updates (comments, build status, ...) we use very simple polling. This does not scale well on GitLab.com. And we want to introduce more real-time updates (https://gitlab.com/gitlab-org/gitlab-ce/issues?scope=all&state=opened&utf8=%E2%9C%93&search=real-time) in the future.
We have to come up with a pattern that is performant enough and stick to it across the whole application.
Plan
- Create a list of possible solutions with pros and cons
- Pick the one that sounds reasonable
- Create a PoC and deploy it to GitLab.com behind a feature flag so we can quickly turn it off
- Start again with point 2 if point 3 causes troubles
Possible solutions
Websockets
This would remove the need for polling, but exchange it with long-lived connections. It's unclear if this will be any better. Websockets also requires us to replace Unicorn with e.g. Puma as Unicorn is not suitable for this. We don't want to run an extra process just for websockets as this complicates deployments, managing infrastructure, etc. Puma is something we have looked into the past, but we're not sure yet: !1899 (closed) and #3592 (closed)
Polling
This is what we currently use, and it doesn't scale. We could merge different polling endpoints/calls into a single one, but this will only work if:
- This new endpoint is faster than the sum of the current ones
- We can guarantee this endpoint stays fast, even when adding more data
Since item 2 somewhat violates the laws of physics (you can't add something new without it taking more time) I don't see this working out very well.
Workhorse + Websockets
Maybe we could add a pub/sub system with the help of Redis based on Websockets that would be terminated by Workhorse and updated (notified) by Rails? We currently think about something similar for GitLab Runner, we don't yet use Websockets (only long-polling connections), but this will use pub implemented in Rails, with sub implemented in Workhorse.
long-polling with MessageBus
See https://github.com/SamSaffron/message_bus
Advantages:
- just Ruby and Redis - we don't introduce a new dependency
- doesn't require a separate process to run
- API is easy to use from the developer perspective
- used in Discourse. Some of their customers installations seem to be big http://www.discourse.org/faq/customers/
Disadvantages:
- I couldn't find any real numbers about it
Nchan - long-polling, Websockets and EventSource (SSE)
Maybe it is out of discussion and I mention about it on another issue, but i use for distribute "realtime" message websocket, but with Nchan. It is pub/sub server, which can handle all suggested types of serving technologies. Configurable by channel can be defined set of method which can publish and another subset of techs which can subscribe. It is small program developed as nginx module, but can run standalone. cen use redis to sync with other instances to create cluster. Support dynamic channels, scaling, authetication . Nchan itself dont asnwer evaluation, but enable in one interface use method preferable for each situation. For example, logged users will be served by websocket and others by pulling, users with developer (and higher) rigths can use on pipeline page websocket, others only long pulling, etc...
Contributions wanted
Please suggest possible solutions and describe them in the most detailed way that you can.