Memory reduction goals
This issue tries to discuss the approaches towards goals for memory reduction of GitLab.
The problem
Todays our proposed memory requirement to run GitLab, for even the smallest instance is 8GB (4GB of RAM + 4GB of Swap) of memory. This is extensive amount for small installations running less than 10 users.
In the past it was possible to run GitLab CE on Raspberry PI, today this is does not provide a sufficiently good experience due to very low CPU and Memory available on that platform.
Our competitors require:
- 16GB at least (GitHub Enterprise),
- 3GB+ (BitBucket Server),
- 1GB (Gitea),
- 3GB (Atlassian Stash).
GitLab consists of multiple components:
- Rails Backend/Frontend that drives Unicorn and Sidekiq (background processing),
- Go-based GitLab Pages,
- Go-based GitLab Workhorse,
- Go-based Gitaly,
- PostgreSQL,
- Redis,
- Git itself can balloon to gigabytes when cloning the repository.
All of these components as they are shipped with GitLab Omnibus do add-up to memory requirements. We are slowly moving very CPU-and-memory intensive parts out of Rails codebase into separate specialized Go-based components (Workhorse, Gitaly). However, all complex and heavy duty data processing continues to happen in Rails backend. An example here would be merge request diff processing.
The reasons
Rails based application, Unicorn and Sidekiq they are very memory hungry components. Due to our monolithic application architecture. This is connected with four reasons:
- A lot of components that are part of single application contribute to very-high baseline memory consumption even though very often we don't use these components we still have to load them in memory,
- Usage of Unicorn and Sidekiq as separate process even though architecturally desired, adds to doubling baseline memory consumption,
- We have very high runtime memory consumption due to application not being optimised,
- Lack of easy to use benchmarking during development that would help and guide how to write performant application code (in terms of used CPU cycles and memory usage), today we have only very comprehensive guidelines for DB practices,
There's a very good low-level explanation from Stan about what we do today and what are the main reasons of the very big memory requirements for GitLab can be find in this comment: https://news.ycombinator.com/item?id=18973499.
A list of random reasons for high memory consumption:
- There is significant Ruby interpreter overhead from running all this Rails code in a single application (see omnibus-gitlab#4118 (comment 141430928))
- We cache a lot of data in Redis which adds to runtime memory consumption,
- We do not preload as much data as needed on Rails side that would reduce duplication of objects when loading them from database,
- We do not test the memory usage of features and workflows,
- Each GitLab Pages process keeps the whole database in memory,
- We process CI and CI traces in Rails / Sidekiq, which could be done by separate CI Daemon, which would reduce CI footprint on system (CPU/Memory) by a significant margin,
- We balloon API workers up to xxxMBs per API call,
- We kill-fork Unicorn workers, as Unicorn are single-threaded, and when they go over threshold we start a new one to recycle memory,
- We run 25 threads on Sidekiq, which can result in GB of allocated memory,
- Rails application does not really follow principle of small Sidekiq jobs, our jobs can run for tens of minutes, and memory used by them can balloon to xxxMBs,
- We have to load the whole application, and all gems,