Skip to content

Switch to using /health for nginx containers

Dmitry Gruzd requested to merge switch-to-using-health-for-nginx into main

What does this MR do and why?

This is related to gitlab-org/gitlab#430182 (closed). We've noticed that sometimes we see connection refused exceptions. I believe that it happens because readiness probe sends a request to the webserver, which doesn't answer in time, so that the pod is marked as unavailable.

This is also problematic also because we can't reach indexer during that period.

This MR changes all nginx readiness checks to use /health of the nginx container itself.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

I've added a new test to ./spec/scripts/integration.sh, so I believe that running the script executes all the steps to verify that it works.

Click to see the output
> Helm upgrade --reset-values ...                           [OK]
> Deploy cert manager ...                                   [OK]
Healthchecks:
> Local | Legacy indexer health ...                         [OK]
> Local | New indexer health ...                            [OK]
> Local | Webserver health ...                              [OK]
> Internal Gateway | Legacy indexer health ...              [OK]
> Internal Gateway | New indexer health ...                 [OK]
> Internal Gateway | Webserver health ...                   [OK]
> Internal Gateway | Nginx health ...                       [OK]
> External Gateway | Nginx health ...                       [OK]
Indexing & Searching:
> Local | Indexer truncate ...                              [OK]
> Local | Legacy indexer ...                                [OK]
> Local | Webserver ...                                     [OK]
> Local | Indexer ...                                       [OK]
> Internal Gateway | Legacy indexer indexing ...            [OK]
> Internal Gateway | Webserver ...                          [OK]
> External Gateway | Indexer ...                            [OK]
> External Gateway | Webserver ...                          [OK]
> External Gateway /nodes endpoint | Webserver search ...   [OK]
> External Gateway /nodes endpoint | Indexer ...            [OK]
TLS:
> Wait for cert-manager-webhook deployment ...              [OK]
> Add certificate ...                                       [OK]
> Enable TLS ...                                            [OK]
> TLS | External Gateway | Webserver ...                    [OK]
> TLS | External Gateway | Indexer ...                      [OK]
> TLS | External Gateway /nodes endpoint | Webserver ...    [OK]
> TLS | External Gateway /nodes endpoint | Indexer ...      [OK]
Upgrade:
> Helm upgrade ...                                          [OK]
Edited by Dmitry Gruzd

Merge request reports