Upgrade Prometheus to prevent crash loops on ARM

Summary

Prometheus 2.25.0 has a bug (patched in 2.25.1 on March 2021) which causes it to crash and continually restart in GitLab 14.4.2 Omnibus and later on ARM 64 architectures.

Steps to reproduce

  1. Install GL Omnibus 14.4.2 or newer onto an ARM 64
  2. Observe the Prometheus continually restarting, no service available:
ec2-user@ip-172-31-17-153:~> while true; do
>   sudo gitlab-ctl status prometheus
>   sleep 5
>   date
> done
run: prometheus: (pid 6196) 5s; run: log: (pid 4397) 397s
Sun 05 Dec 2021 23:43:33 UTC
run: prometheus: (pid 6274) 7s; run: log: (pid 4397) 404s
Sun 05 Dec 2021 23:43:40 UTC
run: prometheus: (pid 6328) 0s; run: log: (pid 4397) 411s
Sun 05 Dec 2021 23:43:47 UTC
...

What is the current bug behavior?

Prometheus crashes, and is then restarted by Omnibus (runsvd).

What is the expected correct behavior?

Prometheus should remain up so that it can perform its work.

Relevant logs

Relevant logs
2021-12-05_23:46:33.86975 level=error ts=2021-12-05T23:46:33.869Z caller=manager.go:188 component="scrape manager" msg="error creating new scrape pool" err="error creating HTTP client: unable to load specified CA cert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt: open /var/run/secrets/kubernetes.io/serviceaccount/ca.crt: no such file or directory" scrape_pool=kubernetes-pods
2021-12-05_23:46:33.86979 level=error ts=2021-12-05T23:46:33.869Z caller=manager.go:188 component="scrape manager" msg="error creating new scrape pool" err="error creating HTTP client: unable to load specified CA cert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt: open /var/run/secrets/kubernetes.io/serviceaccount/ca.crt: no such file or directory" scrape_pool=kubernetes-cadvisor
2021-12-05_23:46:33.86985 level=error ts=2021-12-05T23:46:33.869Z caller=manager.go:188 component="scrape manager" msg="error creating new scrape pool" err="error creating HTTP client: unable to load specified CA cert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt: open /var/run/secrets/kubernetes.io/serviceaccount/ca.crt: no such file or directory" scrape_pool=kubernetes-nodes
2021-12-05_23:46:36.94401 unexpected fault address 0x5f62616c747d2f
2021-12-05_23:46:36.94404 fatal error: fault
2021-12-05_23:46:36.94627 [signal SIGSEGV: segmentation violation code=0x1 addr=0x5f62616c747d2f pc=0x187b9e0]
2021-12-05_23:46:36.94631
2021-12-05_23:46:36.94631 goroutine 1012 [running]:
2021-12-05_23:46:36.94632 runtime.throw(0x21fdccc, 0x5)
2021-12-05_23:46:36.94632 	/usr/local/go/src/runtime/panic.go:1117 +0x54 fp=0x400110b560 sp=0x400110b530 pc=0x45f94
2021-12-05_23:46:36.94632 runtime.sigpanic()
2021-12-05_23:46:36.94633 	/usr/local/go/src/runtime/signal_unix.go:741 +0x230 fp=0x400110b5a0 sp=0x400110b560 pc=0x5da60
2021-12-05_23:46:36.94633 github.com/golang/snappy.encodeBlock(0x4000621802, 0xb6b, 0xb6b, 0x4000620000, 0x9b0, 0xc00, 0x18)
2021-12-05_23:46:36.94634 	/var/cache/omnibus/src/prometheus/pkg/mod/github.com/golang/snappy@v0.0.2/encode_arm64.s:666 +0x360 fp=0x4001113640 sp=0x400110b5b0 pc=0x187b9e0
2021-12-05_23:46:36.94634 github.com/golang/snappy.Encode(0x4000621800, 0xb6d, 0xb6d, 0x0, 0x0, 0x0, 0x9b0, 0x224c274, 0x4001be1788)
2021-12-05_23:46:36.94635 	/var/cache/omnibus/src/prometheus/pkg/mod/github.com/golang/snappy@v0.0.2/encode.go:39 +0x17c fp=0x4001113690 sp=0x4001113640 pc=0x187af1c
2021-12-05_23:46:36.94635 github.com/prometheus/prometheus/tsdb/wal.(*WAL).log(0x400059eb40, 0x4000620000, 0x9b0, 0xc00, 0x1, 0x14, 0x40000c92f0)
2021-12-05_23:46:36.94637 	/var/cache/omnibus/src/prometheus/src/github.com/prometheus/prometheus/tsdb/wal/wal.go:634 +0x368 fp=0x4001113730 sp=0x4001113690 pc=0x1882508
2021-12-05_23:46:36.94663 github.com/prometheus/prometheus/tsdb/wal.(*WAL).Log(0x400059eb40, 0x4001113838, 0x1, 0x1, 0x0, 0x0)
2021-12-05_23:46:36.94673 	/var/cache/omnibus/src/prometheus/src/github.com/prometheus/prometheus/tsdb/wal/wal.go:596 +0xc8 fp=0x40011137c0 sp=0x4001113730 pc=0x18820e8
2021-12-05_23:46:36.94682 github.com/prometheus/prometheus/tsdb.(*headAppender).log(0x4000b97e80, 0x0, 0x0)
2021-12-05_23:46:36.94688 	/var/cache/omnibus/src/prometheus/src/github.com/prometheus/prometheus/tsdb/head.go:1259 +0x26c fp=0x4001113870 sp=0x40011137c0 pc=0x18a395c
2021-12-05_23:46:36.94689 github.com/prometheus/prometheus/tsdb.(*headAppender).Commit(0x4000b97e80, 0x0, 0x0)
2021-12-05_23:46:36.94689 	/var/cache/omnibus/src/prometheus/src/github.com/prometheus/prometheus/tsdb/head.go:1279 +0x74 fp=0x40011139b0 sp=0x4001113870 pc=0x18a3ab4
2021-12-05_23:46:36.94689 github.com/prometheus/prometheus/tsdb.dbAppender.Commit(0x29c3680, 0x4000b97e80, 0x40001a80e0, 0x3, 0x256065fde24b4dba)
2021-12-05_23:46:36.94691 	/var/cache/omnibus/src/prometheus/src/github.com/prometheus/prometheus/tsdb/db.go:794 +0x30 fp=0x40011139f0 sp=0x40011139b0 pc=0x1894ee0
2021-12-05_23:46:36.94691 github.com/prometheus/prometheus/tsdb.(*dbAppender).Commit(0x400049f2c0, 0x4001ea6180, 0x4001ea6180)
2021-12-05_23:46:36.94692 	<autogenerated>:1 +0x48 fp=0x4001113a30 sp=0x40011139f0 pc=0x18bf548
2021-12-05_23:46:36.94692 github.com/prometheus/prometheus/storage.(*fanoutAppender).Commit(0x4000af2180, 0x29c30d0, 0x400049f2d8)
2021-12-05_23:46:36.94693 	/var/cache/omnibus/src/prometheus/src/github.com/prometheus/prometheus/storage/fanout.go:174 +0x34 fp=0x4001113ae0 sp=0x4001113a30 pc=0x1837c94
2021-12-05_23:46:36.94693 github.com/prometheus/prometheus/scrape.(*timeLimitAppender).Commit(0x400049f2d8, 0x10, 0x10)
2021-12-05_23:46:36.94693 	<autogenerated>:1 +0x3c fp=0x4001113b10 sp=0x4001113ae0 pc=0x1948b0c
2021-12-05_23:46:36.94694 github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1(0x4001113ce0, 0x4001be1cf0, 0x4001ea8160)
2021-12-05_23:46:36.94694 	/var/cache/omnibus/src/prometheus/src/github.com/prometheus/prometheus/scrape/scrape.go:1086 +0x40 fp=0x4001113b90 sp=0x4001113b10 pc=0x1947140
2021-12-05_23:46:36.94695 github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport(0x4001ea8160, 0x37e11d600, 0x37e11d600, 0x0, 0x0, 0x0, 0xc06371b726c9d9c7, 0x1d67c07c3, 0x3af9060, 0x0, ...)
2021-12-05_23:46:36.94696 	/var/cache/omnibus/src/prometheus/src/github.com/prometheus/prometheus/scrape/scrape.go:1153 +0x7a4 fp=0x4001113e00 sp=0x4001113b90 pc=0x1940524
2021-12-05_23:46:36.94696 github.com/prometheus/prometheus/scrape.(*scrapeLoop).run(0x4001ea8160, 0x37e11d600, 0x37e11d600, 0x0)
2021-12-05_23:46:36.94696 	/var/cache/omnibus/src/prometheus/src/github.com/prometheus/prometheus/scrape/scrape.go:1039 +0x268 fp=0x4001113fb0 sp=0x4001113e00 pc=0x193fa68
2021-12-05_23:46:36.94697 runtime.goexit()
2021-12-05_23:46:36.94697 	/usr/local/go/src/runtime/asm_arm64.s:1130 +0x4 fp=0x4001113fb0 sp=0x4001113fb0 pc=0x7d164
2021-12-05_23:46:36.94697 created by github.com/prometheus/prometheus/scrape.(*scrapePool).sync
2021-12-05_23:46:36.94697 	/var/cache/omnibus/src/prometheus/src/github.com/prometheus/prometheus/scrape/scrape.go:510 +0x790

Details of package version

Provide the package version installation details
gitlab-ee-14.5.1-ee.0.sles15.aarch64

Environment details

  • Operating System: openSUSE 15.3
  • Installation Target, remove incorrect values:
    • VM: AWS
  • Installation Type, remove incorrect values:
    • New Installation
  • Is there any other software running on the machine: (no other software)
  • Is this a single or multiple node installation? Single node
  • Resources
    • CPU: aarch64
    • Memory total: 7.5Gi

Configuration details

Provide the relevant sections of `/etc/gitlab/gitlab.rb`
external_url 'http://ec2-54-206-79-249.ap-southeast-2.compute.amazonaws.com'

Edited by Mike Lockhart | GitLab