Suvinya Mullaseril - Introduction to GitLab Architecture

module-name: "gltf1002"
area: "Core Technologies"
maintainers:
  - weimeng

Learning Objectives

Understand how the different GitLab components work together.

Content

Part 1: The heart of GitLab: The GitLab Rails app

The GitLab Rails app is at the center of the GitLab user experience and is the code responsible for the presentation and business logic of GitLab's GUIs and APIs.

It is served using a web server, either Puma (default from GitLab 13) or Unicorn (deprecated beginning GitLab 14).

The difference between Puma and Unicorn is how they handle incoming web requests:

Puma handles each request in a thread, with each thread belonging to a thread pool managed by a Puma worker process.
Unicorn handles each request in a separate worker process, with a "master" process managing the pool of worker processes.

In GitLab's use case, Puma's threaded model is more performant than Unicorn's forking model, which is why Puma is now the default for new installs since GitLab 13.

Both Puma's and Unicorn's memory footprints will grow if left unchecked, and is why we have Puma Worker Killer and Unicorn Worker Killer periodically terminate workers when a predefined memory limit is reached.

Part 2: Data store: PostgreSQL

The GitLab web application needs a persistent store of data, and we've picked PostgreSQL to serve this need.

Because PostgreSQL handles each incoming database request/query by forking a new process, it is optimal to reuse existing processes to handle requests and to define an upper limit on the maximum number of processes to avoid resource contention issues. This concept is known as connection pooling.

PostgreSQL does not come with a built-in connection pooler. It instead relies on the client software to do this. Fortunately, Rails provides connection pooling facilities out of the box. For multi-application-node GitLab instance, PgBouncer is used.

Part 3: Background jobs: Sidekiq

There comes a time when you'll want to run operations in the background instead of having them take up time in the foreground. This is where Sidekiq comes in.

For example, posting a comment on a GitLab issue will trigger a background job that sends an mail notification to all participants and followers of that issue. If we handled this in the main application (served by Puma or Unicorn), you'd have to wait until all emails are delivered before the page refreshes and shows your comment!

Another common use case is in ecommerce: It takes a significant amount of time to get confirmation from a payment processor that your payment has either succeeded or been declined. You may have noticed some ecommerce websites process your orders faster than others. This is because they confirm your order first and send the actual payment processing request off to a background job, which then updates the order with your payment status at a later time.

An important thing to note is that Sidekiq is also a Rails application node -- it runs a full copy of the GitLab Rails code. The difference is that Sidekiq has no built-in maximum request duration, unlike Puma or Unicorn which default to a maximum of 60 seconds before requests time out.

The other thing of note about Sidekiq is that the background jobs queue is not stored on the Sidekiq node. Instead, this is stored in Redis.

Part 4: Caching and lists: Redis

Redis is a key-value store whose its data is always modified and read exclusively from system memory. Because of this, it's fast. Because it's fast, it's ideal for caching information which would take a long time to query from PostgreSQL or execute in GitLab Rails. This also makes Redis a great solution to store the background jobs queue for Sidekiq.

Redis is entirely memory-bound -- as long as your data set is smaller than the total memory available to Redis, you're good.

Part 5: Intercepting requests: NGINX and GitLab Workhorse

NGINX is set up as a reverse proxy that serves as the entry point to almost the entire GitLab instance. NGINX's primary role is to properly route requests to the right process -- not everything can or should be handled by Puma or Unicorn. For example, GitLab Pages are served using the GitLab Pages Daemon, and long-running requests are offloaded to GitLab Workhorse instead of being handled by Puma or Unicorn.

GitLab Workhorse is a custom reverse proxy sitting between NGINX and Puma or Unicorn. It was originally created to handle Git over HTTP requests but now also handles other long-running requests.

Part 6: GitLab Shell

Remember how NGINX was described as the entry point to almost the entire GitLab instance? The part which isn't covered by NGINX are Git over SSH requests, which is instead handled by GitLab Shell.

Part 7: GitLab needs Git: Gitaly

Gitaly is used by GitLab components -- GitLab Rails, GitLab Shell and GitLab Workhorse -- to read Git repository data.

How Gitaly works is that it receives a remote procedure call (RPC) from a GitLab component, executes Git commands against the physical repositories on disk, then returns the appropriate response to the component that made the Gitaly call.

Because Gitaly RPCs are synchronous in nature, each Gitaly RPC is configured with a Gitaly timeout value to prevent runaway Gitaly requests that would otherwise hog Puma or Unicorn workers. It's important to note that while the RPC is terminated, the actual Git command run against the physical repository on disk still continues, which can result in resource contention. Gitaly timeouts are responsible for the infamous 4: Deadline Exceeded errors you may see from time to time.

One gotcha to know is that Gitaly makes a request to a GitLab API internal endpoint (/api/v4/internal/allowed) to check if the user making the RPC is actually authorized to perform a Git action. This results in a total of two Puma or Unicorn requests for such actions (one to handle the user's request, another to handle the Gitaly authorization check) and can cause unexpected Puma or Unicorn worker pool exhaustion!