Considerations for bypassing nginx for websockets and https git

GitLab.com is considering removing the NGINX ingress controller from the request path for current HTTPs Git and Websockets traffic. This issue is to discuss the merits of this approach and any considerations we need to make before we connect Workhorse directly to HAProxy.

The motivation of this change is as follows:

Reduce complexity by eliminating a service that is not doing much right now except acting as reverse proxy
Improve reliability by not having to deal with nginx-controller pod cycles
Reduce cost by not having to dedicate cpu/memory resources to NGINX

To start, these diagrams illustrate the current and proposed configuration. The proposal will replace the current NGINX controller with an internal GCP LB which reduces complexity:

Current

graph LR
  subgraph GitLab.com
    CloudFlare
  end

  subgraph GCPLoadBalancer
    external-lb
  end
  

  subgraph HAProxy
    fe
  end

  subgraph GCPInternalLB
    internal-lb
  end

  subgraph NGINXPod
    nginx-controller
  end

  subgraph WebServicePod
    workhorse
    puma
  end

  CloudFlare -->|https| external-lb
  external-lb -->|https| fe
  fe --> |https| internal-lb
  internal-lb --> |https| nginx-controller
  nginx-controller --> |http| workhorse
  workhorse --> |http| puma

Proposed

graph LR
  subgraph GitLab.com
    CloudFlare
  end

  subgraph GCPExternalLB
    external-lb
  end
  

  subgraph HAProxy
    fe
  end

  subgraph GCPInternalLB
    internal-lb
  end

  subgraph WebServicePod
    workhorse
    puma
  end

  CloudFlare -->|https| external-lb
  external-lb -->|https| fe
  fe --> |http| internal-lb
  internal-lb --> |http| workhorse
  workhorse --> |http| puma

For this configuration change we need to ensure that we won't run into any surprises connecting workhorse directly to HAProxy

Current NGINX configuration (pulled from staging)

# Configuration checksum: 16258588997816761317

# setup custom paths that do not require root access
pid /tmp/nginx.pid;

load_module /etc/nginx/modules/ngx_http_modsecurity_module.so;

daemon off;

worker_processes 4;

worker_rlimit_nofile 261120;

worker_shutdown_timeout 10s ;

events {
	multi_accept        on;
	worker_connections  16384;
	use                 epoll;
}

http {
	lua_package_cpath "/usr/local/lib/lua/?.so;/usr/lib/lua-platform-path/lua/5.1/?.so;;";
	lua_package_path "/etc/nginx/lua/?.lua;/etc/nginx/lua/vendor/?.lua;/usr/local/lib/lua/?.lua;;";

	lua_shared_dict configuration_data 5M;
	lua_shared_dict certificate_data 16M;
	lua_shared_dict locks 512k;
	lua_shared_dict sticky_sessions 1M;

	init_by_lua_block {
		require("resty.core")
		collectgarbage("collect")

		local lua_resty_waf = require("resty.waf")
		lua_resty_waf.init()

		-- init modules
		local ok, res

		ok, res = pcall(require, "configuration")
		if not ok then
		error("require failed: " .. tostring(res))
		else
		configuration = res
	configuration.nameservers = { "10.228.0.10" }
		end

		ok, res = pcall(require, "balancer")
		if not ok then
		error("require failed: " .. tostring(res))
		else
		balancer = res
		end

		ok, res = pcall(require, "monitor")
		if not ok then
		error("require failed: " .. tostring(res))
		else
		monitor = res
		end

	}

	init_worker_by_lua_block {
		balancer.init_worker()
		monitor.init_worker()
	}

	real_ip_header      X-Forwarded-For;

	real_ip_recursive   on;

	set_real_ip_from    0.0.0.0/0;

	geoip_country       /etc/nginx/geoip/GeoIP.dat;
	geoip_city          /etc/nginx/geoip/GeoLiteCity.dat;
	geoip_org           /etc/nginx/geoip/GeoIPASNum.dat;
	geoip_proxy_recursive on;

	aio                 threads;
	aio_write           on;

	tcp_nopush          on;
	tcp_nodelay         on;

	log_subrequest      on;

	reset_timedout_connection on;

	keepalive_timeout  75s;
	keepalive_requests 100;

	client_body_temp_path           /tmp/client-body;
	fastcgi_temp_path               /tmp/fastcgi-temp;
	proxy_temp_path                 /tmp/proxy-temp;
	ajp_temp_path                   /tmp/ajp-temp;

	client_header_buffer_size       1k;
	client_header_timeout           60s;
	large_client_header_buffers     4 8k;
	client_body_buffer_size         8k;
	client_body_timeout             60s;

	http2_max_field_size            4k;
	http2_max_header_size           16k;
	http2_max_requests              1000;

	types_hash_max_size             2048;
	server_names_hash_max_size      1024;
	server_names_hash_bucket_size   256;
	map_hash_bucket_size            64;

	proxy_headers_hash_max_size     512;
	proxy_headers_hash_bucket_size  64;

	variables_hash_bucket_size      128;
	variables_hash_max_size         2048;

	underscores_in_headers          off;
	ignore_invalid_headers          on;

	limit_req_status                503;

	include /etc/nginx/mime.types;
	default_type text/html;

	gzip on;
	gzip_comp_level 5;
	gzip_http_version 1.1;
	gzip_min_length 256;
	gzip_types application/atom+xml application/javascript application/x-javascript application/json application/rss+xml application/vnd.ms-fontobject application/x-font-ttf application/x-web-app-manifest+json application/xhtml+xml application/xml font/opentype image/svg+xml image/x-icon text/css text/plain text/x-component;
	gzip_proxied any;
	gzip_vary on;

	# Custom headers for response

	server_tokens off;

	more_clear_headers Server;

	# disable warnings
	uninitialized_variable_warn off;

	# Additional available variables:
	# $namespace
	# $ingress_name
	# $service_name
	# $service_port
	log_format upstreaminfo '$the_real_ip - [$the_real_ip] - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time [$proxy_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id';

	map $request_uri $loggable {

		default 1;
	}

	access_log /var/log/nginx/access.log upstreaminfo if=$loggable;

	error_log  /var/log/nginx/error.log notice;

	resolver 10.228.0.10 valid=30s ipv6=off;

	# See https://www.nginx.com/blog/websocket-nginx
	map $http_upgrade $connection_upgrade {
		default          upgrade;

		# See http://nginx.org/en/docs/http/ngx_http_upstream_module.html#keepalive
		''               '';

	}

	# The following is a sneaky way to do "set $the_real_ip $remote_addr"
	# Needed because using set is not allowed outside server blocks.
	map '' $the_real_ip {

		default          $remote_addr;

	}

	# trust http_x_forwarded_proto headers correctly indicate ssl offloading
	map $http_x_forwarded_proto $pass_access_scheme {
		default          $http_x_forwarded_proto;
		''               $scheme;
	}

	map $http_x_forwarded_port $pass_server_port {
		default           $http_x_forwarded_port;
		''                $server_port;
	}

	# Obtain best http host
	map $http_host $this_host {
		default          $http_host;
		''               $host;
	}

	map $http_x_forwarded_host $best_http_host {
		default          $http_x_forwarded_host;
		''               $this_host;
	}

	# validate $pass_access_scheme and $scheme are http to force a redirect
	map "$scheme:$pass_access_scheme" $redirect_to_https {
		default          0;
		"http:http"      1;
		"https:http"     1;
	}

	map $pass_server_port $pass_port {
		443              443;
		default          $pass_server_port;
	}

	# Reverse proxies can detect if a client provides a X-Request-ID header, and pass it on to the backend server.
	# If no such header is provided, it can provide a random value.
	map $http_x_request_id $req_id {
		default   $http_x_request_id;

		""        $request_id;

	}

	# Create a variable that contains the literal $ character.
	# This works because the geo module will not resolve variables.
	geo $literal_dollar {
		default "$";
	}

	server_name_in_redirect off;
	port_in_redirect        off;

	ssl_protocols TLSv1.1 TLSv1.2;

	# turn on session caching to drastically improve performance

	ssl_session_cache builtin:1000 shared:SSL:10m;
	ssl_session_timeout 10m;

	# allow configuring ssl session tickets
	ssl_session_tickets on;

	# slightly reduce the time-to-first-byte
	ssl_buffer_size 4k;

	# allow configuring custom ssl ciphers
	ssl_ciphers 'ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA:ECDHE-RSA-AES128-SHA:AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-SHA256:AES128-SHA256:AES256-SHA:AES128-SHA:!aNULL:!eNULL:!EXPORT:!DES:!MD5:!PSK:!RC4';
	ssl_prefer_server_ciphers on;

	ssl_ecdh_curve auto;

	proxy_ssl_session_reuse on;

	upstream upstream_balancer {
		server 0.0.0.1; # placeholder

		balancer_by_lua_block {
			balancer.balance()
		}

		keepalive 32;

		keepalive_timeout  30s;
		keepalive_requests 0;

	}

	# Global filters

	## start server _
	server {
		server_name _ ;

		listen 80 default_server reuseport backlog=1024;

		set $proxy_upstream_name "-";

		listen 443  default_server reuseport backlog=1024 ssl ;

		# PEM sha: 71e510cc2fb53e55867f5f2fff3413242db0c90e
		ssl_certificate                         /etc/ingress-controller/ssl/default-fake-certificate.pem;
		ssl_certificate_key                     /etc/ingress-controller/ssl/default-fake-certificate.pem;

		location / {

			set $namespace      "";
			set $ingress_name   "";
			set $service_name   "";
			set $service_port   "0";
			set $location_path  "/";

			rewrite_by_lua_block {
				balancer.rewrite()
			}
			access_by_lua_block {

			}
			header_filter_by_lua_block {

			}
			body_filter_by_lua_block {

			}

			log_by_lua_block {

				balancer.log()
				monitor.call()
			}

			if ($scheme = https) {
				more_set_headers                        "Strict-Transport-Security: max-age=15724800";
			}

			access_log off;

			port_in_redirect off;

			set $proxy_upstream_name "upstream-default-backend";

			client_max_body_size                    1m;

			proxy_set_header Host                   $best_http_host;

			# Pass the extracted client certificate to the backend

			# Allow websocket connections
			proxy_set_header                        Upgrade           $http_upgrade;

			proxy_set_header                        Connection        $connection_upgrade;

			proxy_set_header X-Request-ID           $req_id;
			proxy_set_header X-Real-IP              $the_real_ip;

			proxy_set_header X-Forwarded-For        $the_real_ip;

			proxy_set_header X-Forwarded-Host       $best_http_host;
			proxy_set_header X-Forwarded-Port       $pass_port;
			proxy_set_header X-Forwarded-Proto      $pass_access_scheme;

			proxy_set_header X-Original-URI         $request_uri;

			proxy_set_header X-Scheme               $pass_access_scheme;

			# Pass the original X-Forwarded-For
			proxy_set_header X-Original-Forwarded-For $http_x_forwarded_for;

			# mitigate HTTPoxy Vulnerability
			# https://www.nginx.com/blog/mitigating-the-httpoxy-vulnerability-with-nginx/
			proxy_set_header Proxy                  "";

			# Custom headers to proxied server

			proxy_set_header Referrer-Policy                    "strict-origin-when-cross-origin";

			proxy_connect_timeout                   5s;
			proxy_send_timeout                      60s;
			proxy_read_timeout                      60s;

			proxy_buffering                         off;
			proxy_buffer_size                       4k;
			proxy_buffers                           4 4k;
			proxy_request_buffering                 on;

			proxy_http_version                      1.1;

			proxy_cookie_domain                     off;
			proxy_cookie_path                       off;

			# In case of errors try the next upstream server before returning an error
			proxy_next_upstream                     error timeout;
			proxy_next_upstream_tries               3;

			proxy_pass http://upstream_balancer;

			proxy_redirect                          off;

		}

		# health checks in cloud providers require the use of port 80
		location /healthz {

			access_log off;
			return 200;
		}

		# this is required to avoid error if nginx is being monitored
		# with an external software (like sysdig)
		location /nginx_status {

			allow 127.0.0.1;

			deny all;

			access_log off;
			stub_status on;
		}

	}
	## end server _

	# backend for when default-backend-service is not configured or it does not have endpoints
	server {
		listen 8181 default_server reuseport backlog=1024;

		set $proxy_upstream_name "-";

		location / {
			return 404;
		}
	}

	# default server, used for NGINX healthcheck and access to nginx stats
	server {
		listen 18080 default_server reuseport backlog=1024;

		set $proxy_upstream_name "-";

		location /healthz {

			access_log off;
			return 200;
		}

		location /is-dynamic-lb-initialized {

			access_log off;

			content_by_lua_block {
				local configuration = require("configuration")
				local backend_data = configuration.get_backends_data()
				if not backend_data then
				ngx.exit(ngx.HTTP_INTERNAL_SERVER_ERROR)
				return
				end

				ngx.say("OK")
				ngx.exit(ngx.HTTP_OK)
			}
		}

		location /nginx_status {
			set $proxy_upstream_name "internal";

			access_log off;
			stub_status on;
		}

		location /configuration {
			access_log off;

			allow 127.0.0.1;

			deny all;

			# this should be equals to configuration_data dict
			client_max_body_size                    10m;
			client_body_buffer_size                 10m;
			proxy_buffering                         off;

			content_by_lua_block {
				configuration.call()
			}
		}

		location / {

			set $proxy_upstream_name "upstream-default-backend";
			proxy_set_header    Host   $best_http_host;

			proxy_pass          http://upstream_balancer;
		}

	}
}

stream {
	lua_package_cpath "/usr/local/lib/lua/?.so;/usr/lib/lua-platform-path/lua/5.1/?.so;;";
	lua_package_path "/etc/nginx/lua/?.lua;/etc/nginx/lua/vendor/?.lua;/usr/local/lib/lua/?.lua;;";

	lua_shared_dict tcp_udp_configuration_data 5M;

	init_by_lua_block {
		require("resty.core")
		collectgarbage("collect")

		-- init modules
		local ok, res

		ok, res = pcall(require, "tcp_udp_configuration")
		if not ok then
		error("require failed: " .. tostring(res))
		else
		tcp_udp_configuration = res
	tcp_udp_configuration.nameservers = { "10.228.0.10" }
		end

		ok, res = pcall(require, "tcp_udp_balancer")
		if not ok then
		error("require failed: " .. tostring(res))
		else
		tcp_udp_balancer = res
		end
	}

	init_worker_by_lua_block {
		tcp_udp_balancer.init_worker()
	}

	lua_add_variable $proxy_upstream_name;

	log_format log_stream [$time_local] $protocol $status $bytes_sent $bytes_received $session_time;

	access_log /var/log/nginx/access.log log_stream;

	error_log  /var/log/nginx/error.log;

	upstream upstream_balancer {
		server 0.0.0.1:1234; # placeholder

		balancer_by_lua_block {
			tcp_udp_balancer.balance()
		}
	}

	server {
		listen unix:/tmp/ingress-stream.sock;

		content_by_lua_block {
			tcp_udp_configuration.call()
		}
	}

	# TCP services

	# UDP services

}```
</details>


## Preserving IP

Workhorse supports the `X-Forwarded-For` header and will use it as the RemoteAddr for logging. We currently set `X-Forwarded-For` in HAProxy so there should be no change here.

## Keepalive configuration

Current NGINX config:

            keepalive 32;

            keepalive_timeout  30s;
            keepalive_requests 0;


Keepalives improve performance by holding open TCP connections, we currently disable
keepalives for all Kubernetes traffic because it resulted in connections remaining open to terminating pods https://gitlab.com/gitlab-com/gl-infra/production/-/issues/3140


We have options to enable more aggressive connection re-use at HAProxy if this is an issue later as we migrate more services to Kubernetes. https://www.haproxy.com/blog/http-keep-alive-pipelining-multiplexing-and-connection-pooling/ 

for now, there should be no impact.


## GZIP 

current NGINX config:

gzip on;
gzip_comp_level 5;
gzip_http_version 1.1;
gzip_min_length 256;
gzip_types application/atom+xml application/javascript application/x-javascript application/json application/rss+xml application/vnd.ms-fontobject application/x-font-ttf application/x-web-app-manifest+json application/xhtml+xml application/xml font/opentype image/svg+xml image/x-icon text/css text/plain text/x-component;
gzip_proxied any;
gzip_vary on;


GZIP compression is enabled so this will no longer be set once we remove NGINX.

this can also be set at HAProxy with:

compression algo gzip compression type text/html text/plain


Though I don't see this being a major impact for Git HTTPS anyway, we might consider turning this on at HAProxy for web/api traffic, though there is probably nothing we need to do here since it is covered by CloudFlare https://support.cloudflare.com/hc/en-us/articles/200168086-Does-Cloudflare-compress-resources-


## proxy buffering and proxy request buffering

This is already disabled so there should be no impact there, we also already do  buffering at Cloudflare https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10419#note_368918755 .s

Edited Jan 21, 2021 by John Jarvis