Make pipeline schedule worker resilient
What does this MR do?
Make pipeline schedule worker resilient
Currently, pipeline schedule worker is unstable because it's sometimes killed by excessive memory consumption. You can read more detailed report in https://gitlab.com/gitlab-org/gitlab-ce/issues/61955 gitlab-com/gl-infra/production#805 (closed).
In order to make the performance consistent, we add the following fixes:
-
next_run_at
is always a real value, which based on the PipelineScheduleWorker's cron schedule (Currently,19 * * * *
by default, on gitlab.com). This prevents from duplicate pipelines creation caused by Sidekiq retry with Memory Killer. - Remove exclusive lock. This is already covered by real_next_run change. Rather, this is backfiring as described in https://gitlab.com/gitlab-org/gitlab-ce/issues/61955.
- Use RunPipelineScheduleWorker to spread memory consumption to multiple Sidekiq jobs.
Close https://gitlab.com/gitlab-org/gitlab-ce/issues/61955 gitlab-com/gl-infra/production#805 (closed)
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry - [-] Documentation created/updated or follow-up review issue created
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Performance and testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Shinya Maeda