WIP: Inspect Mongo Oplog (!1106) · Merge requests · gitter / webapp

Eric Eastwood requested to merge feature/inspect-mongo-oplog into develop Apr 04, 2018

Inspect Mongo oplog.

Good docs,

$ mongo mongo-replica-01.prod.gitter
TroupeReplicaSet:PRIMARY> db.getReplicationInfo()
{
        "logSizeMB" : 1024,
        "usedMB" : 1032.28,
        "timeDiff" : 22649,
        "timeDiffHours" : 6.29,
        "tFirst" : "Wed Apr 04 2018 08:51:19 GMT-0500 (Central Standard Time)",
        "tLast" : "Wed Apr 04 2018 15:08:48 GMT-0500 (Central Standard Time)",
        "now" : "Wed Apr 04 2018 15:08:47 GMT-0500 (Central Standard Time)"
}

Spawned out of investigation of oplog window size becoming too small, https://app.datadoghq.com/monitors#555843?group=all&from_ts=1522783781920&to_ts=1522870181920

The oplog window size is the amount of database operations you can buffer before hitting a hard limit. The reason that the window size is still small even after the inserts/deletions have occurred is because the data is still propagating into the actual collections (can only process so many operations at a time). Thanks @northrup for helping me understand this!

@northrup created a snapshot of the box while the oplog was recovering if we want to investigate further, https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Snapshots:visibility=private;search=snap-0fbf778ccdd2f834c;sort=desc:snapshotId

The query count seems like pretty normal load to normal times 🤔. So it probably means that there was one giant query that spiked the oplog which splits it into little idempotent pieces.

Looking at the new item stats we have (like new chat, room, user, etc), there isn't an obvious spike, https://app.datadoghq.com/dash/760607/new-object-stats?live=true&page=0&is_auto=false&from_ts=1522789056558&to_ts=1522875456558&tile_size=m

Edited Apr 18, 2018 by Eric Eastwood

WIP: Inspect Mongo Oplog

Merge request reports