Re-execute PipelineProcessWorker
<!-- Please review https://handbook.gitlab.com/handbook/engineering/infrastructure-platforms/change-management/ for the most recent information on our change plans and execution policies. --> # Production Change ## Change Summary Pipeline process worker for a user was not rescheduled properly by `ConcurrencyLimit::ResumeWorker` due to an outage that occured right before the pipeline was executed. ## Change Details The `PipelineProcessWorker` will need to be re-executed to fix the issue for the customer. This worker is idempotent and was not properly rescheduled by the concurrency limit management code. `PipelineProcessWorker.new.perform(2130380613)` Ticket process and link located [here](https://gitlab.zendesk.com/agent/tickets/668586). RFH with change details located [here](https://gitlab.com/gitlab-com/request-for-help/-/issues/3732#note_2884092075). <!-- To automatically add your change to the GitLab Production calendar update the following fields: - Time tracking - Scheduled Date and Time (UTC in format YYYY-MM-DD HH:MM) Bot: https://gitlab.com/gitlab-com/gl-infra/ops-team/toolkit/change-scheduler --> 1. **Services Impacted** - {Pipeline execution} 1. **Change Technician** - <!-- woodhouse: '`@{{ .Username }}`' -->{+ DRI for the execution of this change +} 1. **Change Reviewer** - @allison.browne 1. **Scheduled Date and Time (UTC in format YYYY-MM-DD HH:MM)** - 2025-11-21 14:00 1. **Time tracking** - <!-- woodhouse: '{{ .Duration}}' -->{+ Time, in minutes, hours, or days, needed to execute all change steps, including rollback +} 1. **Downtime Component** - <!-- woodhouse: '{{ .Downtime }}' -->{+ If there is a need for downtime, include downtime estimate here +} > [!IMPORTANT] > If your change involves scheduled maintenance, add a step to set and > [unset maintenance mode](https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/monitoring/set_maintenance_window.md) > per our runbooks. This will make sure SLA calculations adjust for the maintenance period. ## Preparation > [!NOTE] > The following checklists must be done in advance, before setting the label ~"change::scheduled" ### Change Reviewer checklist <!-- To be filled out by the reviewer. --> ~C4 ~C3 ~C2 ~C1: - [x] Check if the following applies: - The **scheduled day and time** of execution of the change is appropriate. - The [change plan](#detailed-steps-for-the-change) is technically accurate. - The change plan includes **estimated timing values** based on previous testing. - The change plan includes a viable [rollback plan](#rollback). - The specified [metrics/monitoring dashboards](#key-metrics-to-observe) provide sufficient visibility for the change. ~C2 ~C1: - [x] Check if the following applies: - The complexity of the plan is appropriate for the corresponding risk of the change. (i.e. the plan contains clear details). - The change plan includes success measures for all steps/milestones during the execution. - The change adequately minimizes risk within the environment/service. - The performance implications of executing the change are well-understood and documented. - The specified metrics/monitoring dashboards provide sufficient visibility for the change. - If not, is it possible (or necessary) to make changes to observability platforms for added visibility? - The change has a primary and secondary SRE with knowledge of the details available during the change window. - The change window has been agreed with Release Managers in advance of the change. If the change is planned for APAC hours, this issue has an agreed pre-change approval. - The labels ~"blocks deployments" and/or ~"blocks feature-flags" are applied as necessary. ### Change Technician checklist <!-- Search [the incident.io schedule](https://app.incident.io/gitlab/on-call/schedules/01K5YWAGZ7YCQGAG7ATQ9XQWHW) to find who will be on-call at the scheduled day and time. SREs on-call must be informed of weekend C1 changes at least 2 weeks in advance. You can also use the `@sre-oncall` handle in slack to find the current on-call team member. --> - [ ] The [Change Criticality](https://handbook.gitlab.com/handbook/engineering/infrastructure-platforms/change-management/#change-criticalities) has been set appropriately and requirements have been reviewed. - [ ] The [change plan](#detailed-steps-for-the-change) is technically accurate. - [ ] The [rollback plan](#rollback) is technically accurate and detailed enough to be executed by anyone with access. - [ ] This Change Issue is linked to the appropriate Issue and/or Epic - [ ] Change has been tested in staging and results noted in a comment on this issue. - [ ] A dry-run has been conducted and results noted in a comment on this issue. - [ ] The change execution window respects the [Production Change Lock periods](https://about.gitlab.com/handbook/engineering/infrastructure/change-management/#production-change-lock-pcl). - [ ] Once all boxes above are checked, mark the change request as scheduled: `/label ~"change::scheduled"` - [ ] For ~C1 and ~C2 change issues, the change event is added to the [GitLab Production](https://calendar.google.com/calendar/embed?src=gitlab.com_si2ach70eb1j65cnu040m3alq0%40group.calendar.google.com) calendar by the [change-scheduler bot](https://gitlab.com/gitlab-com/gl-infra/ops-team/toolkit/change-scheduler). It is schedule to run every 2 hours. - [ ] For ~C1 change issues, a Senior Infrastructure Manager has provided approval with the ~manager_approved label on the issue. - [ ] For ~C2 change issues, an Infrastructure Manager provided approval with the ~manager_approved label on the issue. - [ ] For ~C1 and ~C2 changes, mention `@gitlab-org/saas-platforms/inframanagers` in this issue to provide visibility to all infrastructure managers. - [ ] For ~C1, ~C2, or ~"blocks deployments" change issues, confirm with Release managers that the change does not overlap or hinder any release process (In `#production` channel, mention `@release-managers` and this issue and await their acknowledgment.) ## Detailed steps for the change ### Pre-execution steps > [!NOTE] > The following steps should be done right at the scheduled time of the change request. The [preparation steps](#preparation) are > listed below. - [ ] Make sure all tasks in [Change Technician checklist](#change-technician-checklist) are done - [ ] For ~C1 and ~C2 change issues, the SRE on-call has been informed prior to change being rolled out. (Check [the incident.io GitLab.com Production EOC schedule](https://app.incident.io/gitlab/on-call/schedules/01K5YWAGZ7YCQGAG7ATQ9XQWHW) to find who will be on-call at the scheduled day and time. SREs on-call must be informed of [plannable C1 changes](https://handbook.gitlab.com/handbook/engineering/infrastructure-platforms/change-management/#approval) at least 2 weeks in advance.) - [ ] The SRE on-call provided approval with the ~eoc_approved label on the issue. - [ ] For ~C1, ~C2, or ~"blocks deployments" change issues, Release managers have been informed prior to change being rolled out. (In `#production` channel, mention `@release-managers` and this issue and await their acknowledgment.) - [ ] There are currently no [active incidents](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/?sort=created_date&state=opened&label_name%5B%5D=Incident%3A%3AActive&or%5Blabel_name%5D%5B%5D=severity%3A%3A1&or%5Blabel_name%5D%5B%5D=severity%3A%3A2&first_page_size=20) that are ~severity::1 or ~severity::2 - [ ] If the change involves doing maintenance on a database host, an appropriate silence targeting the host(s) should be added for the duration of the change. ### Change steps - steps to take to execute the change *Estimated Time to Complete (mins)* - {+Estimated Time to Complete in Minutes+} - [ ] Set label ~"change::in-progress" `/label ~change::in-progress` - [ ] {+Change Step 1+} - [ ] {+Change Step 2+} - [ ] Set label ~"change::complete" `/label ~change::complete` ## Rollback ### Rollback steps - steps to be taken in the event of a need to rollback this change *Estimated Time to Complete (mins)* - {+Estimated Time to Complete in Minutes+} - [ ] {+Rollback Step 1+} - [ ] {+Rollback Step 2+} - [ ] Set label ~"change::aborted" `/label ~change::aborted` ## Monitoring ### Key metrics to observe <!-- * Describe which dashboards and which specific metrics we should be monitoring related to this change using the format below. --> - Metric: {+Metric Name+} - Location: {+Dashboard URL+} - What changes to this metric should prompt a rollback: {+Describe Changes+}
issue