thanos-storegateway in crashloopbackoff breaking update-mgmt-cluster job

Typical job output:

Timed-out waiting for the following resources to be ready:
IDENTIFIER                                           STATUS     REASON             MESSAGE
Kustomization/sylva-system/thanos                    InProgress                    Kustomization generation is 3, but latest observed generation is 2
╰┄╴HelmRelease/sylva-system/thanos                   Failed                        Failed to upgrade after 4 attempt(s)
   ╰┄╴StatefulSet/thanos/thanos-storegateway         InProgress                    Ready: 0/1
      ╰┄╴Pod/thanos/thanos-storegateway-0            InProgress                    Pod is running but is not Ready
         ├┄╴┬┄┄[Conditions]
         ┆  ├┄╴Initialized                           True
         ┆  ├┄╴Ready                                 False      ContainersNotReady containers with unready status: [storegateway]
         ┆  ├┄╴ContainersReady                       False      ContainersNotReady containers with unready status: [storegateway]
         ┆  ╰┄╴PodScheduled                          True
         ╰┄╴┬┄┄[Events]
            ├┄╴2024-09-12 22:05:36                   Normal     Scheduled          Successfully assigned thanos/thanos-storegateway-0 to mgmt-1451490679-kubeadm-capo-oci-control-plane-z6xhn
            ├┄╴2024-09-12 22:05:37                   Normal     Pulling            Pulling image "docker.io/bitnami/thanos:0.36.1-debian-12-r2"
            ├┄╴2024-09-12 22:05:40                   Normal     Pulled             Successfully pulled image "docker.io/bitnami/thanos:0.36.1-debian-12-r2" in 3.208s (3.208s including waiting)
            ├┄╴2024-09-12 22:09:16 (x5 over 3m36s)   Normal     Created            Created container storegateway
            ├┄╴2024-09-12 22:09:17 (x5 over 3m37s)   Normal     Started            Started container storegateway
            ├┄╴2024-09-12 22:25:40 (x8 over 19m29s)  Normal     Pulled             Container image "docker.io/bitnami/thanos:0.36.1-debian-12-r2" already present on machine
            ╰┄╴2024-09-12 22:50:44 (x186 over 44m2s) Warning    BackOff            
       Back-off restarting failed container storegateway in pod thanos-storegateway-0_thanos(d03afd5a-40cf-4445-b70f-6b7296265b52)

This has been seen frequently over the past ~10 days at least.

Example runs:

  • kubeadm-capo: https://gitlab.com/sylva-projects/sylva-core/-/jobs/7813857972
  • kubeadm-capo: https://gitlab.com/sylva-projects/sylva-core/-/jobs/7826431896
  • rke2-capo: https://gitlab.com/sylva-projects/sylva-core/-/jobs/7838192945
  • rke2-capo: https://gitlab.com/sylva-projects/sylva-core/-/jobs/7840631183
  • rke2-capo: https://gitlab.com/sylva-projects/sylva-core/-/jobs/7840631651

I thought I had seen this on capm3 as well, but I'm not sure.

Edited Sep 17, 2024 by Thomas Morin
Assignee Loading
Time tracking Loading