Skip to content

Increase Sidekiq BRPOP timeout from 2 to 5 seconds

Production Change

Change Summary

Set the SIDEKIQ_SEMI_RELIABLE_FETCH_TIMEOUT to 5 on Sidekiq nodes.

This will hopefully help alleviate some of the pressure caused by CPU saturation on redis-sidekiq in #4049 (closed) by reducing the overhead that setting up and tearing down connections causes

Change Details

  1. Services Impacted - ServiceSidekiq
  2. Change Technician - @igorwwwwwwwwwwwwwwwwwwww @reprazent
  3. Change Criticality - C3,
  4. Change Type - changeunscheduled, changescheduled
  5. Change Reviewer - @reprazent
  6. Due Date - Depends on gitlab-org/gitlab!57351 (merged) getting to production
  7. Time tracking - Time, in minutes, needed to execute all change steps, including rollback
  8. Downtime Component - none

Detailed steps for the change

Pre-Change Steps - steps to be completed before execution of the change

Estimated Time to Complete (mins) - Estimated Time to Complete in Minutes

Change Steps - steps to take to execute the change

Estimated Time to Complete (mins) - Estimated Time to Complete in Minutes

Post-Change Steps - steps to take to verify the change

Estimated Time to Complete (mins) - Estimated Time to Complete in Minutes

Rollback

Rollback steps - steps to be taken in the event of a need to rollback this change

Estimated Time to Complete (mins) - Estimated Time to Complete in Minutes

  • Revert aforementioned MRs
  • Rollback Step 2
  • Rollback Step 3

Monitoring

Key metrics to observe

Summary of infrastructure changes

  • Does this change introduce new compute instances? No
  • Does this change re-size any existing compute instances? No
  • Does this change introduce any additional usage of tooling like Elastic Search, CDNs, Cloudflare, etc? No

Summary of the above

Changes checklist

  • This issue has a criticality label (e.g. C1, C2, C3, C4) and a change-type label (e.g. changeunscheduled, changescheduled) based on the Change Management Criticalities.
  • This issue has the change technician as the assignee.
  • Pre-Change, Change, Post-Change, and Rollback steps and have been filled out and reviewed.
  • Necessary approvals have been completed based on the Change Management Workflow.
  • Change has been tested in staging and results noted in a comment on this issue.
  • A dry-run has been conducted and results noted in a comment on this issue.
  • SRE on-call has been informed prior to change being rolled out. (In #production channel, mention @sre-oncall and this issue and await their acknowledgement.)
  • There are currently no active incidents.
Edited by Matt Smiley