Memory Group - 15.2 Planning
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.
Capacity
No noteworthy PTO is expected that could impact the capacity for %15.2. @nmilojevic1
will be back from his leave, so we'll provide him the time to catch-up and then start working on our top priorities.
Planning
%15.2
Top Priorities forInvestigate Puma long-term memory use
Who: @mkaeppler
@alipniagov
Our investigation on the Puma runaway memory issues has generated multiple findings and has lead us to 3 separate paths that we want to keep on working during %15.2:
-
Decide on how to deal with resource allocation on various environments and what to do with the
PumaWorkerKiller
(and other in-application memory killers in general)There are two parts in this discussion; whether we want to disable in-application memory killers in resource managed environments (gitlab-org/gitlab#364184 (closed), gitlab-org/gitlab#364185 (closed)) and how to dynamically set resource limits without hardcoding them in non managed environments (gitlab-org/gitlab#334831 (closed)).
-
Find the origin of the growing memory use of Puma when the pods are not restarted
We believe that the primary driver is heap fragmentation, but we are not 100% sure if that is the only driver. We want to make sure that our assumption is correct and figure out ways to address it.
-
Add ways to gather more data from production servers and diagnose similar issues
- Add Ruby heap fragmentation metric (gitlab-org/gitlab#365252 (closed)) - this will also help with the investigation in (2) as it is important to understand the degree of heap fragmentation.
- Improve memory team self-sufficiency (gitlab-com/gl-infra/reliability#15838)
Support the effort for FIPS compliance
Who: @mkaeppler
There is only one related issue that we are helping with
- TLS security for dedicated metrics servers (gitlab-org/gitlab#364771 (closed))
It is almost completed, but we want to keep this in our planning as we want to make sure that it is completed early next milestone and that we will prioritize any follow-up issues that may be blocking groupdistribution from completing the effort.
Create custom SLIs for Global Search
Who: @rzwambag
We continue our work to support groupglobal search on setting up the custom SLIs for the SearchController
and Search API.
Top priorities for %15.2:
- Add the Prometheus metrics required for the
global_search_success
andglobal_search_apdex
SLIs (gitlab-org/gitlab#342068 (closed), gitlab-org/gitlab#342069 (closed)) - Figure out the current levels so we can decide the SLOs (gitlab-org/gitlab#342071 (closed), gitlab-org/gitlab#342072 (closed))
Optimize workers that consume lot of memory and cause OOM kills
Who: @nmilojevic1
We plan to follow-up on our work to optimize the top offender workers that consume lot of memory and cause OOM kills.
There have been new reports and requests in gitlab-com/gl-infra/capacity-planning#3 and gitlab-org/gitlab#355030 (closed) that we want to start with and evaluate the next candidates to work on.
%15.2
Deprioritized initiatives forThose are long term priority initiatives that we keep track of, but we don't plan to actively work on during %15.2
Update supported Ruby version to 3.0
Who: @alipniagov
Summary of Current Status : gitlab-org&5149 (comment 952318583)
We have completed the parts of the update that depended on ~"group::memory" and we now want to make sure that other groups involved in the process are supported and can successfully complete their parts.
Next steps:
- (groupdistribution) Omnibus and Chart updates. This is the packaging side of the Ruby 3 update. In order to deploy to various environments, Omnibus and Helm Charts (incl. CNG images) must support a Ruby 3 GitLab.
- Ruby 3: Running in production - groupdelivery is aware, there are no objections and effort required to do the final rollout is estimated
Improve efficiency and maintainability of application metrics exporters
Who: @mkaeppler
We are revisiting our approach for serving application metrics into Prometheus by adding a new metrics exporter for GitLab application metrics which is written in Golang.
The next step will be to continue our work towards the production readiness of gitlab-metrics-exporter
:
-
Phase 1: run in sidecar mode
- Our primary goal with this phase is to replace the existing in-app Ruby Prometheus exporters distributed with and running in gitlab-rails.
Note Due to our capacity constraints this is switched to a secondary priority that will follow the completion of the investigation of Puma's long-term memory usage.