Skip to content

Replace polling of notes with Action Cable

Heinrich Lee Yu requested to merge actioncable-notes into master

What does this MR do and why?

Use Action Cable to subscribe to note updates instead of polling.

Context

I was looking at our team's error budgets and the notes polling controller / action had the most violations. I found out that we lowered the target duration from 5s to 0.5s because this is endpoint had one of the highest traffic share. (8% of GitLab.com traffic gitlab-com/gl-infra/scalability#2108 (closed))

The violations are very few (0.04% of the requests to this endpoint) but it's still enough to make it the top contributor to our error budgets. From random spot checks of these slow polls, these seem to be from MR diff notes where I think generating the diff can take some time. I did not investigate the slowness further.

This MR does not aim to fix the slowness but rather reduce the total requests to the endpoint by removing the no-op poll requests so that we could perhaps set the urgency of this endpoint back to low. From a user perspective, these aren't really urgent or high priority requests since it's a background update.

We have a new Vue based notes widget that we're currently building but that's focused on work items and I think it would still take time for that to replace our legacy notes implementations for MRs / epics, so I thought of this hybrid approach where we use the same endpoint that we used for polling but we only hit that when we get an Action Cable message that the noteable's notes are updated. This requires minimal changes to both backend and frontend.

Expected Outcomes

  1. Notes would now be more real-time because we wouldn't have to wait for the poll interval which is currently 6 seconds.
  2. The requests we are removing are the no-op polls or 304s. These are usually very fast (<50ms) since they just do a Redis GET. But there is still some overhead to the large number of these requests so hopefully there is a positive effect on our web / Puma infrastructure with this change. I am still not sure what metrics to look at though.
  3. On the Redis server side, the numerous GET commands will be replaced with fewer SUBSCRIBE and PUBLISH commands. I am also hoping this somehow reduces load on our Redis instance but it's hard to tell until we see the metrics after the change.
  4. The same approach could also be applied to the /realtime_changes endpoint for polling issue / epic description changes. This represents ~2% of GitLab.com traffic.

Bonus: I think the frontend code is also a bit simpler this way and we don't have the hacks to stop / start polling in certain scenarios. We do the stop / start on slow update requests like applying a suggestion because while the update is still processing, a poll could happen and reset the UI state to the state before the update. This happens because with 304s, the backend sends an empty body back, but the browser / Javascript actually processes the AJAX response of the last successful request that is cached by the browser. That's one of the reasons I dislike this ETag caching hack for polling and in my opinion we should move away from that. And this is one way we can do that without having to rewrite legacy pages to Vue / Apollo / GraphQL subscriptions.

Note: This MR isn't meant to be merged in this state. I plan to feature flag this and I would also need to deploy the backend first to prevent multi-version compatibility issues.

How to set up and validate locally

  1. Open an issue / merge request / epic in multiple tabs or browsers
  2. Create / update a note or award an emoji
  3. Notes should be updated instantly in the other tabs / windows.
Edited by Heinrich Lee Yu

Merge request reports