Perform an automated, scheduled thread and task dump from the ES logging production cluster
refs: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10094
At the moment we have no observability related to what the cluster is doing. The purpose of this issue is to create automation that would dump tasks and threads so that we have some diagnostics that we can use for troublshooting.
-
account on ES -
GCS bucket: https://ops.gitlab.net/gitlab-com/gl-infra/terraform-modules/google/storage-buckets/-/merge_requests/25 -
tf repo change to pick up the new version of the module: https://ops.gitlab.net/gitlab-com/gitlab-com-infrastructure/-/merge_requests/1753 -
fix sa reference: https://ops.gitlab.net/gitlab-com/gl-infra/terraform-modules/google/storage-buckets/-/merge_requests/26 -
tf repo change to pick up fixed module: https://ops.gitlab.net/gitlab-com/gitlab-com-infrastructure/-/merge_requests/1754 -
runbooks script: gitlab-com/runbooks!2239 (merged) -
flamegraph script: gitlab-com/runbooks!2240 (merged) -
move all scripts to one dir: gitlab-com/runbooks!2248 (merged) -
docker image: ci-images!17 (merged) -
authenticate gsutil: gitlab-com/runbooks!2251 (merged) -
CronJob -
helm release: gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles!37 (merged) -
switch to using .com registry: gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles!49 (closed) -
discussion about which registry to use, where to build images and how to deploy them: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10295 -
discussion about how to structure simpleapp
: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10433 -
using pullImageSecret: gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles!62 (merged) -
Cloud NAT in the ops cluster: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10511 -
fix env var test: es-diagnostics!2 (merged) -
deploy a new image: gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles!68 (merged) -
trigger a helm release: gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles!69 (merged) -
revert the annotation: gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles!70 (merged) -
binding for nonprod account: https://ops.gitlab.net/gitlab-com/gitlab-com-infrastructure/-/merge_requests/1831 -
annotations for the service account, requiered for Workload Identity: gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles!71 (diffs) -
fix service account name used by the metadata server: gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles!72 (merged)
-
-
switch to multiple buckets in ops env: -
tf module: https://ops.gitlab.net/gitlab-com/gl-infra/terraform-modules/google/storage-buckets/-/merge_requests/28 -
tf repo: https://ops.gitlab.net/gitlab-com/gitlab-com-infrastructure/-/merge_requests/1777 -
switch to a dedicated project: -
runbooks: gitlab-com/runbooks!2306 (merged) -
es-diagnostics: es-diagnostics!1 (merged) -
ci-images: ci-images!21 (merged)
-
-
-
-
k8s docs: gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles!41 (merged) -
adjust permissions for the ES account -
put an ops-gitlab-net token for downloading images from the ops registry ( a token was generated for the ops-gitlab-net
account in the ops.gitlab.net instance)
Edited by Michal Wasilewski