Skip to content
Snippets Groups Projects
Commit 4eb1df02 authored by Mitchell Nielsen's avatar Mitchell Nielsen Committed by Jason Plum
Browse files

Add triage script for active Helm Chart installs

Adds a script to help triage active Helm Chart installations.
parent 5e5074fc
No related branches found
No related tags found
1 merge request!3626CI: Add triage script for active Helm Chart installs
......@@ -69,3 +69,24 @@ Certain jobs in CI use a backup of GitLab during testing. Complete the steps bel
1. Finally, update `.variables.TEST_BACKUP_PREFIX` in `.gitlab-ci.yml` to the new version of the backup.
Future pipelines will now use the new backup artifact during testing.
## CI clusters are low on available resources
You may notice one or more CI clusters run low on available resources like CPU
and memory. Our clusters are configured to automatically scale the available
nodes, but sometimes we hit the upper limit and therefore no more nodes can be
created. In this case, a good first step is to see if any installations of the
GitLab Helm Charts in the clusters can be removed.
Installations are usually cleaned up automatically by the Review Apps logic in
the pipeline, but this can fail for various reasons. See the following issues
for more details:
- [What can we do about cleaning up failed deploys in CI?](https://gitlab.com/gitlab-org/charts/gitlab/-/issues/2076)
- [https://gitlab.com/gitlab-org/charts/gitlab/-/issues/5338](https://gitlab.com/gitlab-org/charts/gitlab/-/issues/5338)
As a workaround, these installations can be manually deleted by running the associated
`stop_review` job(s) in CI. To make this easier, use the
[`helm_ci_triage.sh`](https://gitlab.com/gitlab-org/charts/gitlab/blob/master/scripts/ci/helm_ci_triage.sh)
script to get a list of running installations and open the associated pipeline to run
the `stop_review` job(s). Further usage details are available in the script.
#!/usr/bin/env bash
# Prints active Helm releases in a given cluster namespace and opens
# the parent pipeline in the browser.
#
# Used to triage installations, specifically those that have not been
# automatically uninstalled after the Review App deadline for various
# reasons (job failure, manual retry, etc.).
#
# Dependencies:
# - helm: https://helm.sh
# - fzf: https://github.com/junegunn/fzf
# - yq: https://github.com/mikefarah/yq
# - column: https://linux.die.net/man/1/column
#
# Usage:
# 1. Connect to a Kubernetes cluster in your terminal session.
# 2. Connect to the namespace you want to check (using `kubectl config set-context` or `kubens`, for example).
# Alternatively, you can pass an environment variable `NAMESPACE` to override the namespace setting.
# 3. Run this script: `./scripts/ci/helm_ci_triage.sh`
# 4. Use the up/down arrows to select a release. You can also type to filter the results.
# 5. Press 'enter' to select the release, which print out the URL of the associated pipeline.
# Most terminal emulators support opening the link from the output, sometimes while holding a modifier.
# 6. Run the relevant `stop_review` job(s) from the CI pipeline page.
set -e
ns="$(kubectl config view --minify -o jsonpath='{..namespace}')"
if [ -n "${NAMESPACE}" ]; then
ns="${NAMESPACE}"
fi
releases=$(helm ls --no-headers --date --namespace="${ns}" \
| awk '{print $1 " " $4 " " $5}' \
| column -t)
release=$(printf "quit\n${releases}" \
| fzf --header='Active Helm releases (type or select then press "enter" to open parent pipeline)' \
| awk '{print $1}')
if [ "${release}" = "quit" ]; then
echo 'No release selected, exiting...'
exit 1
fi
url=$(helm get values --namespace="${ns}" "${release}" | yq .ci.pipeline.url)
printf "Pipeline URL for %s:\n %s\n" "${release}" "${url}"
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment