Improve our HA offering

Dev: https://dev.gitlab.org/gitlab/gitlab-ee/issues/320

Patricio

I just finished the call with a customer and they are unhappy with our HA offering. They think it is misleading that we market GitLab EE as being HA, without really having a full featured documentation on how to make it so. They are thinking about asking for a refund.

The only reason they bought EE was because of HA, and it turns out they were already doing most of the things we suggest with CE.

The suggestions we make in the Standard group are to use DRBD to have a master and a slave server with manual failover, but they consider this Disaster Recovery rather than HA (I agree, to be honest).

The second option is to follow the load-balanced-cluster documentation, but they complain that we have the following message in there:

This is a horizontally scalable but NOT HA setup: the load balancer and the backend are both single points of failure.

I know that we note that because it's true. This is not an HA setup, but unfortunately this is all the documentation we offer on HA.

The fact that the backend is a single point of failure and that we don't offer a solution to this problem is what really disappointed them.

Like @sytses suggested I have asked for a call with them, @jacobvosmaer, and @JobV for further discussion. Let's hope we can keep them as clients and improve our documentation based on their suggestions.

Jacob

this issue is now for documentation that does not ship in our releases

with proper failover GitLab is HA

GitLab has no built-in synchronization mechanism (like MySQL master-master, or Wandisco's multi-site Git). I know of only one customer using our DRBD scripts. In a call with prospect who actually had experience with DRBD, they said "I don't want to baby-sit a GitLab server".

It is nice that a manual failover script worked for us but this is not what I would expect when I hear 'HA solution'.

@sytses your suggestions about Pacemaker and LizardFS are helpful but I am not sure if we can make the customer happy with that. They might not be able to help us (!) with Pacemaker, and LizardFS is not something we can offer right now. Can you clarify if/how/when we can just give them their money back?

Sytse

I don't understand, with proper failover GitLab is HA, just not clustered, which is something different.

If they want their money back they can get it back.

But we need to find out:

Did they expect HA or clustering?

What is their expected HA failover time?

How do they do HA for other applications?

Are they comfortable with pacemaker?

What we can than maybe offer (depending on the answers):

HA GitLab with DRBD

Automated pacemarker failovers

50% discount for this year because we didn't have the automated pacemarker scripts ready yet but will need to develop

Sytse

I think we can help them automate it with Pacemaker if they are already experienced with that for other projects.

Jacob can also mention LizardFS is something we work on.

Job

Pacemaker is extremely time-intensive for us to write documentation for and we won't use this ourselves, meaning no experienced support or up-to-date documentation for this.

Our strategy going into the call is as @sytses suggested:

Did they expect HA or clustering?

What is their expected HA failover time?

How do they do HA for other applications?

And go from there. If they do mention pacemaker and would like to use that, we can reconsider.

I forgot to add the notes of the issue. Here they are for posterity:

It is totally fine to use OCFS2 and Nimble iSCSI mounts, since that works well for you.

With the Enterprise Edition you can use your managed MySQL and Redis servers.

You should let LTM use HTTPS when connecting to NGINX on the app servers

Like Jacob mentioned during the call we have a cookbook that we use to set up parts of the cluster. This is located in this repo: https://gitlab.com/standard/cookbook-omnibus-gitlab/tree/master

We also touched the topic of external mount. Like Jacob said, if you mount your drives in the location that GitLab expects its data to be, it is not necessary to change the configuration.

These directories are:

/var/opt/gitlab/git-data => All repository data is stored here.

/var/opt/gitlab/gitlab-rails/uploads => Contains uploaded content by the users.

/var/opt/gitlab/backups => If you use the included rake task to create backups, they will be stored here.

Jacob Added

Don't mount your shared volume at /var/opt/gitlab, you will share too much. Create a separate mountpoint (e.g. /mnt/gitlab) and put the following in gitlab.rb:

# put repos on /mnt/gitlab
git_data_dir '/mnt/gitlab/git-data'
# put uploads on /mnt/gitlab
gitlab_rails['uploads_directory'] = '/mnt/gitlab/uploads'
# put authorized_keys for 'git' on /mnt/gitlab
user['home'] = '/mnt/gitlab/home-git'

Next, make sure that both of your app servers have the same UIDs/GIDs for the GitLab users, or else file permissions will break. The numbers below are made-up, you can choose your own.

user['uid'] = 1100
user['gid'] = 1100

postgresql['uid'] = 1101
postgresql['gid'] = 1101

redis['uid'] = 1102
redis['gid'] = 1102

web_server['uid'] = 1103
web_server['gid'] = 1103

We only have a migration path from MySQL to Postgres, not the other way around. If your current deployment uses Postgresql and you want to keep your GitLab metadata (issues, merge requests, comments, user permissions) then you need to use Postgres on your new deployment too.

/cc @dzaporozhets @marin