Skip to content

WIP: Inspect Mongo Oplog

Eric Eastwood requested to merge feature/inspect-mongo-oplog into develop

Inspect Mongo oplog.

Good docs,

$ mongo mongo-replica-01.prod.gitter
TroupeReplicaSet:PRIMARY> db.getReplicationInfo()
{
        "logSizeMB" : 1024,
        "usedMB" : 1032.28,
        "timeDiff" : 22649,
        "timeDiffHours" : 6.29,
        "tFirst" : "Wed Apr 04 2018 08:51:19 GMT-0500 (Central Standard Time)",
        "tLast" : "Wed Apr 04 2018 15:08:48 GMT-0500 (Central Standard Time)",
        "now" : "Wed Apr 04 2018 15:08:47 GMT-0500 (Central Standard Time)"
}

Spawned out of investigation of oplog window size becoming too small, https://app.datadoghq.com/monitors#555843?group=all&from_ts=1522783781920&to_ts=1522870181920

The oplog window size is the amount of database operations you can buffer before hitting a hard limit. The reason that the window size is still small even after the inserts/deletions have occurred is because the data is still propagating into the actual collections (can only process so many operations at a time). Thanks @northrup for helping me understand this!

@northrup created a snapshot of the box while the oplog was recovering if we want to investigate further, https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Snapshots:visibility=private;search=snap-0fbf778ccdd2f834c;sort=desc:snapshotId


The query count seems like pretty normal load to normal times 🤔. So it probably means that there was one giant query that spiked the oplog which splits it into little idempotent pieces.

Looking at the new item stats we have (like new chat, room, user, etc), there isn't an obvious spike, https://app.datadoghq.com/dash/760607/new-object-stats?live=true&page=0&is_auto=false&from_ts=1522789056558&to_ts=1522875456558&tile_size=m

Edited by Eric Eastwood

Merge request reports