sidekiq catchall batch 6 eval/migration

Discovered in &276 (closed) there are three queues where we are not 100% sure there blocker may lie. Utilize this issue to migrate these three queues off of catchall VM's into catchnfs VM's for the purposes of evaluating whether they are using NFS in a method that would prevent us from migrating these queues into Kubernetes.

Queues in question:

Milestones

  • Queues copied to catchnfs VMs
  • Evaluation completed determining where NFS reads/writes are being requested
  • If safe, remove the NFS mounts from catchnfs VM's
    • If not safe, gather as much detailed information as possible to provide any assistance to the Engineers necessary to resolve the blocker
    • Remove the queue from evaluation
  • If plausible, migrate these queues into Kubernetes
    • Performing this action means the associated issues tied to &276 (closed) can be closed

Evaluation

  • Ensure the jobs are being called upon: https://dashboards.gitlab.net/d/sidekiq-queue-detail/sidekiq-queue-detail?orgId=1
  • Utilize the script created in: #1045 (closed) to monitor for NFS usage and utilize the output of the script to compare with data in Kibana
    • This will provide us the following query we can use to evaluate whether or not a queue should be removed from the N batch: PromQL:
      sum by (queue, env) (rate(sidekiq_nfs_monitor_nfs_access_detected[5m])) 
      /
      sum by (queue, env) (rate(sidekiq_nfs_monitor_jobs_started[5m]))
  • Utilize PromQL to detect NFS usage on the catchnfs server:
    • gstg - PromQL: sum(rate(node_mountstats_nfs_operations_requests_total{env="gstg", type="sidekiq", instance=~"sidekiq-catchnfs-.*"}[1m])) by (operation) > 0
    • gprd - PromQL: sum(rate(node_mountstats_nfs_operations_requests_total{env="gprd", type="sidekiq", instance=~"sidekiq-catchnfs-.*"}[1m])) by (operation) > 0
  • Monitor Sentry for errors after NFS mounts are removed: https://sentry.gitlab.net/gitlab/gitlabcom/?query=is%3Aunresolved+shard%3Acatchnfs
Edited by John Skarbek