Increase ingress-nginx large-client-header-buffers to address HTTP 400 error in workload cluster cattle-cluster-agent
What does this MR do and why?
During testing of OKD workload cluster deployment, we encountered a problem where the cluster-import step times out and fails.
After investigating, we found that the cattle-system/cattle-cluster-agent pod in the workload cluster was returning the following error when attempting to register with rancher...
time="2024-07-26T00:36:04Z" level=info msg="Connecting to wss://rancher.sylva/v3/connect/register with token starting with 97sfkbl9xqkktzt4zt8kvsx864t"
time="2024-07-26T00:36:04Z" level=info msg="Connecting to proxy" url="wss://rancher.sylva/v3/connect/register"
time="2024-07-26T00:36:04Z" level=error msg="Failed to connect to proxy. Response status: 400 - 400 Bad Request. Response body: <html>\r\n<head><title>400 Request Header Or Cookie Too Large</title></head>\r\n<body>\r\n<center><h1>400 Bad Request</h1></center>\r\n<center>Request Header Or Cookie Too Large</center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n" error="websocket: bad handshake"
time="2024-07-26T00:36:04Z" level=error msg="Remotedialer proxy error" error="websocket: bad handshake"
Additionally, the rke2-ingress-nginx-controller logs in the management cluster were showing the 400 response code...
192.168.222.129 - - [26/Jul/2024:00:36:04 +0000] "GET /v3/connect/register HTTP/1.1" 400 226 "-" "Go-http-client/1.1" 201 0.000 [] [] - - - - dcaf832f8df19bafc823fb31d778e766
The requests did not appear to be proxied to the rancher service since they were erroring out at the ingress controller.
It appears that this is a common problem with the ingress-nginx controller as described in https://github.com/kubernetes/ingress-nginx/issues/319
It is recommended in that issue to set large-client-header-buffers: "4 16k" in the ingress-nginx controller configMap to increase the size of large client header buffers and allow for larger request headers/cookies.
Related reference(s)
Test coverage
We tested this by manually updating the configMap and found that it allowed the cattle-cluster-agent registration to succeed. Additionally, we successfully tested end-to-end deployment of management and workload clusters with this setting.