Skip to content

[META] add bastion hosts to GPRD environment

I'm creating this issue to be a source of truth about bastion setup until its done and moved into documentation.

Overview

graph LR
A[client] ==>|"ssh (tcp/22)"| B{"TCP LB <br/> FQDN: bastion.gitlab.com <br/> CLIENT_IP session affinity"}

subgraph bastion.zone-0
 b0s[sshd]
 b0n[nginx]
end

subgraph bastion.zone-1
 b1s[sshd]
 b1n[nginx]
end

subgraph bastion.zone-2
 b2s[sshd]
 b2n[nginx]
end

B -->|"ssh (tcp/22)"| b0s
B ==>|"ssh (tcp/22)"| b1s
B -->|"ssh (tcp/22)"| b2s

B-. "healthcheck (tcp/80)" .->b0n
B-. "healthcheck (tcp/80)" .->b1n
B-. "healthcheck (tcp/80)" .->b2n

b1s ==> |"ssh via ProxyCommand"| X[server]

Tasks

export DEBIAN_FRONTEND='noninteractive'
apt-get -qq update && apt-get -qq install nginx-extras

cat > /etc/nginx/sites-available/ssh-is-active <<EOF
server {
        listen 80;
        server_name ssh-is-active.bastion;
        root /var/www/ssh-is-active;
        index index.html;
        location /ssh-is-active {
                content_by_lua_block {
                        if 0 == os.execute("/bin/systemctl is-active --quiet ssh") then
                                ngx.exit(200)
                        else
                                ngx.exit(503)
                        end
                }
        }
}
EOF

ln -s /etc/nginx/sites-{available,enabled}/ssh-is-active
systemctl reload nginx

(done in gitlab-cookbooks/gitlab-openssh!7 (merged)) Make sure it fixes the above failing health checks.

  • figure out proper FQDNs for everything
  • figure out the the sshd hostkey distribution, as now keys on bastions are NOT managed by chef, hence the need to disable strict host key checks for LB itself (while this does not impact security seriously, this is uggly) (done in gitlab-cookbooks/gitlab-openssh!7 (merged))
  • tighten firewalls and security groups
  • move TCP LB internet facing port to some other port than 22. Rationale: its just about minimizing the number of automated bot generated password attempts in our logs. And since every client will add a proper entry for bastion into their .ssh/config, there's no need to keep ssh on tcp/22 for LB. (obviously, we can keep port as 22 on the end servers)
  • the logging needs love
  • the chef bastion roles need love
  • the user management needs attention
  • sshd_config needs attention and proper template
  • users should not be able to get interactive ssh sessions on bastions themselves, only request port forwarding
  • figure out 2fa on bastion hosts
  • automate bastion host key secret regeneration and re-upload to GCP (technical debt, this is currently done manually once)
  • TECH DEBT: figure out what are those failing health checks in addition to succeeding ones:
130.211.1.17 - - [13/Apr/2018:03:13:02 +0000] "GET / HTTP/1.1" 404 178 "-" "GoogleHC/1.0"
169.254.169.254 - - [13/Apr/2018:03:13:02 +0000] "GET /-/available-ssh HTTP/1.1" 200 5 "-" "-"
169.254.169.254 - - [13/Apr/2018:03:13:04 +0000] "GET /-/available-ssh HTTP/1.1" 200 5 "-" "-"
130.211.1.20 - - [13/Apr/2018:03:13:06 +0000] "GET / HTTP/1.1" 404 178 "-" "GoogleHC/1.0"
130.211.1.20 - - [13/Apr/2018:03:13:06 +0000] "GET / HTTP/1.1" 404 178 "-" "GoogleHC/1.0"
169.254.169.254 - - [13/Apr/2018:03:13:06 +0000] "GET /-/available-ssh HTTP/1.1" 200 5 "-" "-"
130.211.1.65 - - [13/Apr/2018:03:13:06 +0000] "GET / HTTP/1.1" 404 178 "-" "GoogleHC/1.0"
130.211.1.65 - - [13/Apr/2018:03:13:07 +0000] "GET / HTTP/1.1" 404 178 "-" "GoogleHC/1.0"
130.211.1.17 - - [13/Apr/2018:03:13:07 +0000] "GET / HTTP/1.1" 404 178 "-" "GoogleHC/1.0"
130.211.1.17 - - [13/Apr/2018:03:13:07 +0000] "GET / HTTP/1.1" 404 178 "-" "GoogleHC/1.0"

Finishing touches:

  • write docs and runbooks, make sure the setup is easy for non-Ops people
  • block direct access from internet to ssh
  • generate minimal tightened images for bastion hosts to start from
  • scale bastions down to f1-micro (requires going chefless)
Edited by Ilya Frolov