Skip to content
Snippets Groups Projects

Fix memory leak in CI config includes entry

Merged Stan Hu requested to merge gb-fix-slow-ci-config-15-10 into 15-10-stable-ee
1 unresolved thread

What does this MR do and why?

This backports !122462 (merged) to 15-10-stable-ee.

In GitLab 11.9, gitlab-foss!24098 (merged) introduced validation of include keywords. However, it quietly introduced a memory leak: any time a new include entry was instantiated, a proc would be added to the aspects class variable. As the list grew, the time taken to process other include keywords would grow.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

  • This MR is backporting a bug fix, documentation update, or spec fix, previously merged in the default branch.
  • The original MR has been deployed to GitLab.com (not applicable for documentation or spec changes).
  • This MR has a severity label assigned (if applicable).
  • This MR has been approved by a maintainer (only one approval is required).
  • Ensure the e2e:package-and-test-ee job has either succeeded or been approved by a Software Engineer in Test.

Note to the merge request author and maintainer

If you have questions about the patch release process, please:

Edited by Stan Hu

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • assigned to @stanhu

  • Stan Hu changed the description

    changed the description

  • A deleted user added backend label

    added backend label

  • Contributor
    4 Warnings
    :warning: This merge request changed CI config files but did not update the schema. Please consider updating the schema to reflect these changes:
    • spec/lib/gitlab/ci/config/entry/includes_spec.rb
    • lib/gitlab/ci/config/entry/includes.rb.

    Refer to the docs for help on how to run and write specs for the CI schema.

    :warning: This merge request does not refer to an existing milestone.
    :warning: Most of the time, merge requests should target master. Otherwise, please set the relevant Pick into X.Y label.
    :warning: The e2e:package-and-test-ee job needs to succeed or have approval from a Software Engineer in Test.
    Read the "QA e2e:package-and-test-ee" section for more details.

    Reviewer roulette

    Changes that require review have been detected!

    Please refer to the table below for assigning reviewers and maintainers suggested by Danger in the specified category:

    Category Reviewer Maintainer
    backend Ian Baum current availability (@ibaum) (UTC-5, 2 hours ahead of @stanhu) Etienne Baqué current availability (@ebaque) (UTC-3, 4 hours ahead of @stanhu)

    To spread load more evenly across eligible reviewers, Danger has picked a candidate for each review slot, based on their timezone. Feel free to override these selections if you think someone else would be better-suited or use the GitLab Review Workload Dashboard to find other available reviewers.

    To read more on how to use the reviewer roulette, please take a look at the Engineering workflow and code review guidelines. Please consider assigning a reviewer or maintainer who is a domain expert in the area of the merge request.

    Once you've decided who will review this merge request, assign them as a reviewer! Danger does not automatically notify them for you.

    QA e2e:package-and-test-ee

    @stanhu, the package-and-test job must complete before merging this merge request.*

    If there are failures on the package-and-test pipeline, ping your team's associated Software Engineer in Test (SET) to confirm the failures are unrelated to the merge request. If there's no SET assigned, ask for assistance on the #quality Slack channel.

    If needed, you can retry the :repeat: danger-review job that generated this comment.

    Generated by :no_entry_sign: Danger

  • Contributor

    Allure report

    allure-report-publisher generated test report!

    e2e-package-and-test: :exclamation: test report for e3117337

    expand test summary
    +-----------------------------------------------------------------------+
    |                            suites summary                             |
    +------------------+--------+--------+---------+-------+-------+--------+
    |                  | passed | failed | skipped | flaky | total | result |
    +------------------+--------+--------+---------+-------+-------+--------+
    | Verify           | 264    | 0      | 45      | 5     | 309   | ❗     |
    | Govern           | 4      | 0      | 0       | 0     | 4     | ✅     |
    | Create           | 16     | 0      | 2       | 0     | 18    | ✅     |
    | Framework sanity | 0      | 0      | 2       | 0     | 2     | ➖     |
    | Monitor          | 8      | 0      | 0       | 0     | 8     | ✅     |
    | Plan             | 8      | 0      | 0       | 0     | 8     | ✅     |
    | Manage           | 6      | 0      | 0       | 0     | 6     | ✅     |
    +------------------+--------+--------+---------+-------+-------+--------+
    | Total            | 306    | 0      | 49      | 5     | 355   | ❗     |
    +------------------+--------+--------+---------+-------+-------+--------+
  • Grzegorz Bizon approved this merge request

    approved this merge request

    • Resolved by Jenny Kim

      :wave: @grzesiek, thanks for approving this merge request.

      This is the first time the merge request is approved. Please ensure the e2e:package-and-test-ee job has succeeded. If there is a failure, a Software Engineer in Test (SET) needs to confirm the failures are unrelated to the merge request. If there's no SET assigned to this team, ask for assistance on the #quality Slack channel.

  • Stan Hu marked the checklist item The original MR has been deployed to GitLab.com (not applicable for documentation or spec changes). as completed

    marked the checklist item The original MR has been deployed to GitLab.com (not applicable for documentation or spec changes). as completed

  • Stan Hu marked the checklist item This MR has a [severity label] assigned (if applicable). as completed

    marked the checklist item This MR has a [severity label] assigned (if applicable). as completed

  • Stan Hu marked the checklist item This MR has been approved by a maintainer (only one approval is required). as completed

    marked the checklist item This MR has been approved by a maintainer (only one approval is required). as completed

  • Stan Hu requested review from @furkanayhan

    requested review from @furkanayhan

  • Stan Hu requested review from @sabrams and removed review request for @furkanayhan

    requested review from @sabrams and removed review request for @furkanayhan

  • Stan Hu marked the checklist item Ensure the e2e:package-and-test-ee job has either succeeded or been approved by a Software Engineer in Test. as completed

    marked the checklist item Ensure the e2e:package-and-test-ee job has either succeeded or been approved by a Software Engineer in Test. as completed

  • Furkan Ayhan added bugperformance label and removed bugfunctional label

    added bugperformance label and removed bugfunctional label

  • Stan Hu requested review from @jennykim-gitlab and removed review request for @sabrams

    requested review from @jennykim-gitlab and removed review request for @sabrams

  • Jenny Kim resolved all threads

    resolved all threads

  • 🤖 GitLab Bot 🤖 added sectionci label and removed sectionops label

    added sectionci label and removed sectionops label

  • Jenny Kim mentioned in commit fddfe1c1

    mentioned in commit fddfe1c1

  • merged

  • Jenny Kim resolved all threads

    resolved all threads

    • @stanhu is this fix related to a backport request? The last 15.10 release (15.10.8) has been released so we're not planning to publish any more changes

    • @amyphillips as far as I know this fix has been released in 15.11.8. Maybe that is sufficient, but I think that we planned to release it in 15.10.8 too :thinking: That being said, a customer who had been impacted the most already upgraded to 15.11.8.

      Edited by Grzegorz Bizon
    • I think that we planned to release it in 15.10.8 too

      @grzesiek do you know where this was decided?

    • Edited by Grzegorz Bizon
    • FWIW, 15.10 isn't listed in the backport request which covered 15.11 and 16.0 (the second pair of MRs in the table - 122539, 122530)

      Edited by Ben Prescott_
    • @amyphillips @grzesiek - no I don't think so. I am not aware of a customer who needs this on 15.10.

    • @jreporter - more broadly, what do you think about backporting this? The fix works and can solve a significant performance problem on GitLab when used with a high number of includes.

      Should we backport this further than what is needed for the escalation, to try to catch more folks?

    • Should we backport this further than what is needed for the escalation, to try to catch more folks?

      You tell me @joshlambert - how far should we go?

    • @joshlambert @jreporter - if no customers are making noise about this and any customer that wants to move to 16.x has to stop at 15.11.8 anyway, does it make sense to create a backport to an unsupported version?

      Even if we were to try and include it in the next security release (after 16.1), the security release only covers N-2 (15.11 >=) so we'd have to do extra, unplanned release work for this. If there's significant customer disruption then this may make sense, but if nobody needs it, it is more efficient for us not to do it.

    • @swiskow - I think the issue is the following:

      • This is a really bad bug, however how bad it impacts performance/stability depends on the # of includes and rate of CI jobs.
      • All customers we know are impacted have upgraded to 15.11.8.
      • There is an unknown # of customers who are impacted but this hasn't risen to the level of crippling their instance yet. Or they've been doing with it for awhile and have just accepted that this is how it is.

      So it's going to be a call based on how much incremental benefit we think we will get out of a 15.10 patch for that unknown cohort.

      @grzesiek - do you know how long this bug has been around for? If it's been around for a long time, which I think it has, the incremental benefit of a 15.10 patch may not be that significant.

      If this is a more recent bug, it may be more important to get out as we may have folks who haven't hit the affected versions yet, or gotten to a point where they realize they have a problem.

    • The core code change picks cleanly to 15.10 - I created !122631 (closed) so I could follow our hotfix process and try out this patch in my lab (on 15.10.6)

      So, for customers running packaged GitLab (Omnibus) who really can't upgrade to 15.11, this process is an option. This option isn't available for Docker and Helm.

      (Edit: my guess is that it's likely to pick cleanly into a lot of previous releases)

      Edited by Ben Prescott_
    • In GitLab 11.9, gitlab-foss!24098 (merged) introduced validation of include keywords. However, it quietly introduced a memory leak: any time a new include entry was instantiated, a proc would be added to the aspects class variable. As the list grew, the time taken to process other include keywords would grow.

      This sounds like it has been around a very long time @joshlambert

      Do we know how many customers are on prior versions and if it makes sense to add backports based on those numbers?

    • Given that - I'm not sure it makes sense to keep backporting. We should just ensure we make Support broadly aware that this is a thing, and its fixed in 15.11.8+.

    • Author Owner

      @amyphillips Yes, it's not directly related to a backport release for 15.10. In the interest of time, I created 3 backports (15.10, 15.11, and 16.0) before the backport release task issue was even created. This is such a severe issue and straightforward fix that I thought it would be useful to have this in the 15.10.x in the off chance another release goes out for 15.10.

      Indeed, this has been around since GitLab 11.9, as I documented in the merge request description. It flew under the radar for a while because we upgrade quite frequently on GitLab.com and have many more processes, which lowers the chance that this issue affects us. But it's highly likely this bug was causing slowness for many customers.

      Edited by Stan Hu
    • Given that - I'm not sure it makes sense to keep backporting. We should just ensure we make Support broadly aware that this is a thing, and its fixed in 15.11.8+.

      This is my instinct as well, thanks @joshlambert and @stanhu :bow:

    • Please register or sign in to reply
Please register or sign in to reply
Loading