Create new index for parentId in Gitter MongoDB
When implementing Thread Messages in Gitter, we need each thread message to have a reference to its parent. This is achieved by adding parentId
attribute to the gitter.chatmessages
collection. Following text explains how are we going to do that.
What to index?
We will use a compound index. Similarly to the current { toTroupeId: 1, sent: -1 }
{
parentId: 1, // we are going to use only `eq` so we don't mind either asc or desc
sent: -1, // descending (i.e. more recent to oldest) sent date
}
This compound index will support sorting child messages based on sent date as well as just searching for child messages without sorting. All that thanks to prefixes.
Full or partial index?
Note: Sparce indexes have been deprecated in favour of partial indexes
A full index keeps references to every member of the collection, whilst partial indexes only index the documents in a collection that meet a specified filter expression.
The original plan was to create a partial index because it supports looking up child messages well and it would be in orders of magnitude smaller. The problem with that is that it doesn't support negative lookup (used for the main message feed)
db.chatmessages.find({parentId: {$exists: false}});
So even though the partial index looked promising, we'll result to creating a full index:
db.chatmessages.createIndex(
{ parentId: 1, sent: -1 }
)
- reference: first example from https://docs.mongodb.com/manual/core/index-partial/#comparison-with-the-sparse-index
Creating an index in DB
Considerations:
- difficulty of execution
- risk
Option 1: Background index build on master (preferred)
This option is preferred because thanks to testing in beta the risk doesn't seem too big and it's trivially implemented compared to the Option 2.
A background index build on a primary replica as background index builds on secondaries. The replication worker does not take a global DB lock, and secondary reads are not affected.
Building an index can have a severe impact on the performance of the database. If possible, build indexes during designated maintenance windows.
This involves creating an index on the master in the background and let the index replicate from the master after it is finished.
The amount of memory that is dedicated to index creation is by default 500MB if the index size exceeds this size, the process starts creating tmp files. And gets slower.
-
All mesurements are from reading collection stats
db.runCommand( { collStats : "chatmessages", scale: 1024 } );
-
Prod:
- the size of
chatmessages
collection: 63GB (63 728 644KB) - the size of the example index
toTroupeId_1_sent_-1
: 1.8GB (1 877 556KB)
- the size of
-
Beta:
- the size of
chatmessages
collection: 27GB (27707618KB) - the size of the example index
toTroupeId_1_sent_-1
: 0.6GB (629 112KB)
- the size of
Proposed index:
db.chatmessages.createIndex(
{ parentId: 1, sent: -1 },
{
background: true
}
)
And mongoose schema change in ChatMessageSchema
:
ChatMessageSchema.index(
{ parentId: 1, sent: -1 },
{
background: true
}
);
mongo-beta-01
:
Test run in - the size of proposed index
parentId_1_sent_-1
: 451120KB (took 14minutes to build) - beta mongo is m4.large (2CPU, 7.5GB MEM) and each production replica is r3.large (2CPU, 15GB MEM)
- the
mongo-beta-01
seemed to handle to load fine:
Option 2: Use rolling index build
Recommended option for Replica Sets. This involves one by one taking replicas out of the replica set, building indexes and adding the replicas back in. That lowers the risk of index build degrading the database performance.
But it raises the risk by more manual work involved during taking replicas out and then stepping down the primary itself.
References:
- https://docs.mongodb.com/manual/core/index-compound/#prefixes
- https://docs.mongodb.com/manual/core/index-single/
- https://docs.mongodb.com/manual/core/index-creation/#performance
- https://docs.mongodb.com/v3.2/reference/command/createIndexes/
- https://docs.mongodb.com/v3.2/reference/command/dropIndexes/
- https://docs.mongodb.com/manual/core/index-partial/#comparison-with-the-sparse-index
- https://docs.mongodb.com/manual/reference/command/collStats/#dbcmd.collStats
- https://docs.mongodb.com/manual/tutorial/build-indexes-on-replica-sets/#e-build-the-index-on-the-primary
- https://mongoosejs.com/docs/4.x/docs/guide.html#statics