Optimizing MongoDB query for chat archive (!1887) · Merge requests · gitter / webapp

Tomas Vik requested to merge archive-from-replica into develop May 21, 2020

Our MongoDB experiences overload lately https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10247

I'm unable to identify the exact cause of the issue but this seems like a great optimization and maybe a quick win.

The problem

First found in https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10247#note_345631422

We are not limiting the number of messages that we return for archive pages. Some of the rooms had thousands of messages in one day and querries for those archives are overloading our MongoDB.

One of these archive pages has 9000+ messages:

TroupeReplicaSet:SECONDARY> db.chatmessages.find({
...     toTroupeId: db.troupes.findOne({lcUri: "etherdelta/etherdelta.github.io"})._id,
...     _id: {
...         $gt: ObjectId("5a5011800000000000000000"),
...         $lt: ObjectId("5a5163000000000000000000")
...     }
... }).count()
9275

In a sample of the production traffic, I found that we've got ~350 archive requests/hour that are taking longer than a second to process.

Solution

Limit the amount of returned messages to 1500 max - This will mean that some messages won't show in archive but the site becomes unusable with thousands of messages anyway.
Read archives from a replica. This makes good senes because the archive pages don't need realtime data and the chance of replica being out of sync with the master is much lower.

Testing

I tested locally that the archive page still works as expected. I didn't test the behaviour with over 1,500 messages.

Edited May 21, 2020 by Tomas Vik

Optimizing MongoDB query for chat archive

The problem

Solution

Testing

Merge request reports