Product proposal for new EE features and a new (EE only) routing/load balancing product.
Scale GitLab horizontally by spreading projects over multiple application servers, while keeping a single (global) SQL database and session store.
Application servers store repositories on local disk storage. Each project exists on local disk on only one application server. None of the application servers has all repos on its local disk. Intelligent new load balancers, HTTPRouter and SSHRouter, send incoming requests to the application server that holds the repository the request is about.
Scale horizontally with commodity servers, no custom monolithic file storage solution needed. Fast Git repository operations on local disk. Effective OS disk caching. Little difference between application servers in a sharded deployment and traditional single-server deployments.
Still relies on a single SQL DB. Needs two new applications (HTTPRouter and SSHRouter) and GitLab modifications to communicate with the routers.
What it is not
It is not in itself a HA solution although it could be made HA. It is not a multi-site solution.
Proposed technology stack
- Redis for communication between application servers and routers
- OpenResty (NGINX + embedded Lua) for HTTPRouter http://openresty.org/ ; used by TaoBao and Cloudflare.
- Go for SSHRouter (use https://godoc.org/golang.org/x/crypto/ssh )
- Standard GitLab omnibus packages (NGINX+Unicorn+Sidekiq+...) on the application servers
- Postgres/MySQL SQL server
- Redis for session/cache storage (need not be the same instance but it could be)
- NFS for uploads (attachments) (or use S3?)
Yes, double NGINX: once in the HTTPRouter and once on the application server.
HTTP request cycle
HTTPRouter is relatively dumb. It only knows how to look up what host a project lives on, in Redis. Little or no further knowledge of GitLab internals.
Redis lookup hit
Request for gitlab.com/gitlab-org/gitlab-ce/tree/master comes in to HTTPRouter. (There is no HAproxy anymore.) Lua filters out the project name 'gitlab-org/gitlab-ce' and looks up in Redis what host has that repo (example). Lua then lets NGINX proxy the request to the application server at that hostname.
Redis lookup miss
Request for gitlab.com/foo/bar/issues. Lua filters out 'foo/bar' and cannot find it in Redis. Pass the request to a random application server.
If the 'foo/bar' repo exists in SQL and on disk on the random application server (lucky hit) update the Redis routing map and handle the request.
If the 'foo/bar' repo exists in SQL but it is not on local disk at the application server, update the Redis routing map and redirect to self. We end up in the 'Redis lookup hit' scenario.
If the 'foo/bar' does not exist in SQL return 404 from the application server.
When receiving a POST on /projects/new, pick a random application server from a special set in Redis: servers with room on local disk. If application servers run out of disk space they can remove themselves from this set. This ensure balancing of new repos across application servers.
Not a project-related request
E.g. gitlab.com/profile or gitlab.com/help. Just proxy from HTTPRouter to a random application server.
SSH session cycle
SSHRouter is also dumb. Use Redis instead of an authorized_keys file. GitLab application servers can keep the 'authorized_keys table' in Redis up-to-date.
Unknown SSH key
Look up the SSH public key in Redis; disconnect if the key is unknown.
Repo known in Redis routing table
Redis told us the internal key ID for the user's SSH key, e.g. key-123. Inspect the requested SSH command, e.g. 'git upload-pack gitlab-org/gitlab-ce.git' and filter out 'gitlab-org/gitlab-ce'. Look up the hostname for that repo in the Redis routing table. Establish a second SSH connection to the appropriate application server, and run 'gitlab-shell --trust-me key-123 git upload-pack gitlab-org/gitlab-ce.git'. Copy data back and forth between the two SSH sessions (example).
Repo unknown in Redis routing table
Do an internal API request to find the hostname for the repo. Proceed if found. Abort SSH connection if not found.
- Store a hostname for each Project in SQL.
- Add/remove SSH keys in Redis instead of the authorized_keys file.
- Adapt Project finders with 'if project exists in SQL but not on local disk then update Redis routing table and redirect back to HTTPRouter' logic.
- Manage 'I can accept new repos' status in Redis routing table.
- Work around non-local repository content references ('foo/[email protected]' in issue comments).