Fix memory leak in CI config includes entry
What does this MR do and why?
This backports !122462 (merged) to 15-10-stable-ee
.
In GitLab 11.9, gitlab-foss!24098 (merged)
introduced validation of include
keywords. However, it quietly
introduced a memory leak: any time a new include
entry was
instantiated, a proc would be added to the aspects
class variable.
As the list grew, the time taken to process other include
keywords
would grow.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
This MR is backporting a bug fix, documentation update, or spec fix, previously merged in the default branch. -
The original MR has been deployed to GitLab.com (not applicable for documentation or spec changes). -
This MR has a severity label assigned (if applicable). -
This MR has been approved by a maintainer (only one approval is required). -
Ensure the e2e:package-and-test-ee
job has either succeeded or been approved by a Software Engineer in Test.
Note to the merge request author and maintainer
If you have questions about the patch release process, please:
- Refer to the patch release runbook for engineers and maintainers for guidance.
- Ask questions on the
#releases
Slack channel (internal only).
Merge request reports
Activity
assigned to @stanhu
added bugfunctional grouppipeline authoring typebug labels
added devopsverify sectionops labels
- A deleted user
added backend label
4 Warnings This merge request changed CI config files but did not update the schema. Please consider updating the schema to reflect these changes: spec/lib/gitlab/ci/config/entry/includes_spec.rb
-
lib/gitlab/ci/config/entry/includes.rb
.
Refer to the docs for help on how to run and write specs for the CI schema.
This merge request does not refer to an existing milestone. Most of the time, merge requests should target master
. Otherwise, please set the relevantPick into X.Y
label.The e2e:package-and-test-ee
job needs to succeed or have approval from a Software Engineer in Test.
Read the "QA e2e:package-and-test-ee" section for more details.Reviewer roulette
Changes that require review have been detected!
Please refer to the table below for assigning reviewers and maintainers suggested by Danger in the specified category:
Category Reviewer Maintainer backend Ian Baum (
@ibaum
) (UTC-5, 2 hours ahead of@stanhu
)Etienne Baqué (
@ebaque
) (UTC-3, 4 hours ahead of@stanhu
)To spread load more evenly across eligible reviewers, Danger has picked a candidate for each review slot, based on their timezone. Feel free to override these selections if you think someone else would be better-suited or use the GitLab Review Workload Dashboard to find other available reviewers.
To read more on how to use the reviewer roulette, please take a look at the Engineering workflow and code review guidelines. Please consider assigning a reviewer or maintainer who is a domain expert in the area of the merge request.
Once you've decided who will review this merge request, assign them as a reviewer! Danger does not automatically notify them for you.
QA
e2e:package-and-test-ee
@stanhu, the
package-and-test
job must complete before merging this merge request.*If there are failures on the
package-and-test
pipeline, ping your team's associated Software Engineer in Test (SET) to confirm the failures are unrelated to the merge request. If there's no SET assigned, ask for assistance on the#quality
Slack channel.If needed, you can retry the
danger-review
job that generated this comment.Generated by
DangerAllure report
allure-report-publisher
generated test report!e2e-package-and-test:
test report for e3117337expand test summary
+-----------------------------------------------------------------------+ | suites summary | +------------------+--------+--------+---------+-------+-------+--------+ | | passed | failed | skipped | flaky | total | result | +------------------+--------+--------+---------+-------+-------+--------+ | Verify | 264 | 0 | 45 | 5 | 309 | ❗ | | Govern | 4 | 0 | 0 | 0 | 4 | ✅ | | Create | 16 | 0 | 2 | 0 | 18 | ✅ | | Framework sanity | 0 | 0 | 2 | 0 | 2 | ➖ | | Monitor | 8 | 0 | 0 | 0 | 8 | ✅ | | Plan | 8 | 0 | 0 | 0 | 8 | ✅ | | Manage | 6 | 0 | 0 | 0 | 6 | ✅ | +------------------+--------+--------+---------+-------+-------+--------+ | Total | 306 | 0 | 49 | 5 | 355 | ❗ | +------------------+--------+--------+---------+-------+-------+--------+
- Resolved by Jenny Kim
@grzesiek
, thanks for approving this merge request.This is the first time the merge request is approved. Please ensure the
e2e:package-and-test-ee
job has succeeded. If there is a failure, a Software Engineer in Test (SET) needs to confirm the failures are unrelated to the merge request. If there's no SET assigned to this team, ask for assistance on the#quality
Slack channel.
Setting label(s) Category:Pipeline Composition based on grouppipeline authoring.
added Category:Pipeline Composition label
requested review from @furkanayhan
requested review from @sabrams and removed review request for @furkanayhan
added bugperformance label and removed bugfunctional label
requested review from @jennykim-gitlab and removed review request for @sabrams
- Resolved by Jenny Kim
@jennykim-gitlab Would you be able to merge this?
added sectionci label and removed sectionops label
mentioned in commit fddfe1c1
@stanhu is this fix related to a backport request? The last 15.10 release (15.10.8) has been released so we're not planning to publish any more changes
@amyphillips as far as I know this fix has been released in
15.11.8
. Maybe that is sufficient, but I think that we planned to release it in15.10.8
too That being said, a customer who had been impacted the most already upgraded to 15.11.8.Edited by Grzegorz BizonI think that we planned to release it in
15.10.8
too@grzesiek do you know where this was decided?
@joshlambert might know that. I will ask on Slack. Edit: https://gitlab.slack.com/archives/C04QGA6NAAC/p1686826171013119.
Edited by Grzegorz BizonFWIW, 15.10 isn't listed in the backport request which covered 15.11 and 16.0 (the second pair of MRs in the table - 122539, 122530)
Edited by Ben Prescott_Thanks @bprescott_
@amyphillips @grzesiek - no I don't think so. I am not aware of a customer who needs this on 15.10.
@jreporter - more broadly, what do you think about backporting this? The fix works and can solve a significant performance problem on GitLab when used with a high number of includes.
Should we backport this further than what is needed for the escalation, to try to catch more folks?
Should we backport this further than what is needed for the escalation, to try to catch more folks?
You tell me @joshlambert - how far should we go?
@joshlambert @jreporter - if no customers are making noise about this and any customer that wants to move to 16.x has to stop at 15.11.8 anyway, does it make sense to create a backport to an unsupported version?
Even if we were to try and include it in the next security release (after 16.1), the security release only covers N-2 (15.11 >=) so we'd have to do extra, unplanned release work for this. If there's significant customer disruption then this may make sense, but if nobody needs it, it is more efficient for us not to do it.
@swiskow - I think the issue is the following:
- This is a really bad bug, however how bad it impacts performance/stability depends on the # of includes and rate of CI jobs.
- All customers we know are impacted have upgraded to 15.11.8.
- There is an unknown # of customers who are impacted but this hasn't risen to the level of crippling their instance yet. Or they've been doing with it for awhile and have just accepted that this is how it is.
So it's going to be a call based on how much incremental benefit we think we will get out of a 15.10 patch for that unknown cohort.
@grzesiek - do you know how long this bug has been around for? If it's been around for a long time, which I think it has, the incremental benefit of a 15.10 patch may not be that significant.
If this is a more recent bug, it may be more important to get out as we may have folks who haven't hit the affected versions yet, or gotten to a point where they realize they have a problem.
The core code change picks cleanly to 15.10 - I created !122631 (closed) so I could follow our hotfix process and try out this patch in my lab (on 15.10.6)
So, for customers running packaged GitLab (Omnibus) who really can't upgrade to 15.11, this process is an option. This option isn't available for Docker and Helm.
(Edit: my guess is that it's likely to pick cleanly into a lot of previous releases)
Edited by Ben Prescott_In GitLab 11.9, gitlab-foss!24098 (merged) introduced validation of
include
keywords. However, it quietly introduced a memory leak: any time a newinclude
entry was instantiated, a proc would be added to theaspects
class variable. As the list grew, the time taken to process otherinclude
keywords would grow.This sounds like it has been around a very long time @joshlambert
Do we know how many customers are on prior versions and if it makes sense to add backports based on those numbers?
@amyphillips Yes, it's not directly related to a backport release for 15.10. In the interest of time, I created 3 backports (15.10, 15.11, and 16.0) before the backport release task issue was even created. This is such a severe issue and straightforward fix that I thought it would be useful to have this in the 15.10.x in the off chance another release goes out for 15.10.
Indeed, this has been around since GitLab 11.9, as I documented in the merge request description. It flew under the radar for a while because we upgrade quite frequently on GitLab.com and have many more processes, which lowers the chance that this issue affects us. But it's highly likely this bug was causing slowness for many customers.
Edited by Stan HuGiven that - I'm not sure it makes sense to keep backporting. We should just ensure we make Support broadly aware that this is a thing, and its fixed in 15.11.8+.
This is my instinct as well, thanks @joshlambert and @stanhu