Review app: nginx-ingress-controller in crash loop
Continuing from slack thread: https://gitlab.slack.com/archives/CMA7DQJRX/p1622102289075600
Summary:
- nginx-ingress-controller pods using image
k8s.gcr.io/ingress-nginx/controller:v0.41.2
goes into crash loop backoff
Observations:
- high CPU usage by
controller
container: - multiple events observed in the pod:
NGINX reload triggered due to a change in configuration
- pod logs: https://cloudlogging.app.goo.gl/2NcdSEFvNG2EKVH19
Slack discussion
Andrey 1 hour ago newly deployed review app do seem to be in a bad shape though, I see a lot of failures to even open the login page
Andrey 1 hour ago https://gitlab.com/gitlab-org/gitlab/-/jobs/1297424294 <- even performance job did not execute
Andrey 1 hour ago qa-smoke is failing as well
alberts 1 hour ago https://gitlab.com/gitlab-org/gitlab/-/jobs/1297424294 <- even performance job did not execute This release name: review-allure-rep-fdpkkf It seems that nginx ingress is in a crash loop for another reason https://cloudlogging.app.goo.gl/2NcdSEFvNG2EKVH19 (edited)
accounts.google.comaccounts.google.com Google Cloud Platform Google Cloud Platform lets you build, deploy, and scale applications, websites, and services on the same infrastructure as Google.
alberts 1 hour ago @rymai any ideas?
alberts 1 hour ago This one is freshly installed, so nothing to do with helm upgrade
alberts 41 minutes ago There’s a new container called “controller”, instead of “nginx-ingress-controller” which has been running close to CPU limit Screenshot 2021-05-27 at 4.35.34 PM.png Screenshot 2021-05-27 at 4.35.34 PM.png
remy 41 minutes ago Yeah I’ve noticed that the nginx-controller pods hit the CPU limit: https://console.cloud.google.com/kubernetes/pod/us-central1-b/review-apps/review-apps/review-allure-rep-fdpkkf-nginx-ingress-controller-b68c5966k69dq/details?project=gitlab-review-apps accounts.google.comaccounts.google.com Google Cloud Platform Google Cloud Platform lets you build, deploy, and scale applications, websites, and services on the same infrastructure as Google.
remy 40 minutes ago Screen Shot 2021-05-27 at 10.36.47.png Screen Shot 2021-05-27 at 10.36.47.png
alberts 39 minutes ago I’m looking at the container name
alberts 39 minutes ago I wonder if it changed, and our base_config wasnt updated
remy 38 minutes ago 2021-05-27T08:32:09.923577Z "Configuration changes detected, backend reload required" I 2021-05-27T08:33:50.929615Z "Backend successfully reloaded" I 2021-05-27T08:33:50.929753Z "Initial sync, sleeping for 1 second" I 2021-05-27T08:33:50.929875Z Event(v1.ObjectReference{Kind:"Pod", Namespace:"review-apps", Name:"review-allure-rep-fdpkkf-nginx-ingress-controller-b68c5966k69dq", UID:"43fc491b-964c-4b01-b9f6-7e4c708ad48e", APIVersion:"v1", ResourceVersion:"806299827", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration I 2021-05-27T08:34:12.021492Z "Received SIGTERM, shutting down" I 2021-05-27T08:34:12.021877Z "Shutting down controller queues" I 2021-05-27T08:34:12.164601Z "Stopping NGINX process" I 2021-05-27T08:34:12.429563395Z 2021/05/27 08:34:12 [notice] 95#95: signal process started E 2021-05-27T08:34:13.525489Z "NGINX process has stopped" I 2021-05-27T08:34:13.525525Z "Handled quit, awaiting Pod deletion" I 2021-05-27T08:34:23.525709Z "Exiting" code=0 I
remy 37 minutes ago
It still seems to be nginx-ingress.controller: https://docs.gitlab.com/charts/charts/globals.html
alberts 37 minutes ago This new “controller” started appearing around the same time we changed to new chart version
alberts 31 minutes ago Increase the CPU request first? (edited)
alberts 30 minutes ago I looked through the logs for the nginx-ingress-controller pod https://cloudlogging.app.goo.gl/AxBqmHfxJAK3sxTj6, nothing seems amiss. It finds the matching ingress class
accounts.google.comaccounts.google.com Google Cloud Platform Google Cloud Platform lets you build, deploy, and scale applications, websites, and services on the same infrastructure as Google.
remy 29 minutes ago I found which updates the NGINX chart: gitlab-org/charts/gitlab!1690 (diffs)
alberts 28 minutes ago I also found this change, but didn’t see anything different in the values.yml. Maybe I missed something New
remy 27 minutes ago
Yeah I don’t see anything different either.
alberts 24 minutes ago
!62372 (merged) In this MR that upgraded the chart, the review app page could load. https://gitlab-review-331577-rev-527cns.gitlab-review.app/users/sign_in, so it’s not the chart itself