Commit 95ecbea2 authored by Craig Miskell's avatar Craig Miskell 💬

More docs on user auth troubleshooting (thanks Amar) and fix some whitespacing

parent cf9b50f4
groups:
- name: user-auth-events.rules
rules:
- alert: BlockedUserAttemptsIsHigh
- alert: BlockedUserAttemptsIsHigh
expr: sum(rate(gitlab_auth_user_blocked_total{environment="gprd"}[5m]))/sum(rate(gitlab_auth_user_authenticated_total{environment="gprd"}[5m])) > 0.01
for: 15m
labels:
severity: warn
severity: warn
annotations:
description: Higher than expected rate of login attempts for blocked users
description: Higher than expected rate of login attempts for blocked users
runbook: troubleshooting/blocked-user-logins.md
title: High rate of blocked user logins
title: High rate of blocked user logins
- alert: NoSuccessfulLogins
expr: sum(rate(gitlab_auth_user_authenticated_total{environment="gprd"}[5m])) == 0
for: 5m
......
......@@ -20,11 +20,20 @@ The metric being higher than expected (an arbitrary threshold set by hand) for a
1. An abuse blocking operation has caught too many users
1. A bug in a release has caused a large number of users to be blocked, or to be interpreted as blocked
1. Some sort of weirdness with an oAuth partner
Check with #abuse (mostly automated notifications), #security (@abuse-team) for possible abuse related issues.
Check with #releases if it looks like a release related issue
An active release should show up in the dashboard as an annotation, and #announcements from the deployment tasks. If it looks possibly related to a release, then check with the people in #releases about details, rollback, and other options.
Other debugging ideas that may provide useful clues:
* Check whether you can log in to yourself, as your normal account, and/or as your high priv admin account
* See if the problem is specific to password, password + 2FA, or oAuth type logins.
* Confirm whether this affects just production, or potentially staging + ops as well (the latter suggesting some possible external trigger)
* Use the 'type' variable on the dashboard to see if this is specific to a type of backend (git, web, api)
And as always the [Triage dashboard]( https://dashboards.gitlab.net/d/RZmbBr7mk/gitlab-triage?orgId=1) is an excellent place to look.
There is unlikely to be any direct and immediate technical resolution steps that the on-call SRE can take here; mostly it will be alerting and then supporting other teams in diagnosing what's going on.
This is still a somewhat experimental alert; please feel free to reconsider/discuss both the threshold value and the 'for' interval, particularly if this proves to be overly sensitive; the intention is that this should alert only in extreme and surprising situations.
This is still a somewhat experimental alert; please feel free to reconsider/discuss both the threshold value and the 'for' interval, particularly if this proves to be overly sensitive; the intention is that this should alert only in extreme and surprising situations.
......@@ -21,4 +21,10 @@ There are two broad scenarios where this could alert:
In the latter case, we expect many other alerts to be going off and the root cause to be clear; this alert is largely for the former case.
In the event the site is up, and it's only logins that are failing, check for action in #releases or #security (@abuse-team) team. In particular, if blocked user login attempts is large, treat this as though [BlockedUserAttemptsIsHigh](blocked-user-logins.md) was firing.
In the event the site is up, and it's only logins that are failing, check for action in #announcements, #releases, or #security (@abuse-team) team. In particular, if blocked user login attempts is large, treat this as though [BlockedUserAttemptsIsHigh](blocked-user-logins.md) was firing.
Other debugging ideas that may provide useful clues:
* Check whether you can log in to yourself, as your normal account, and as your high priv admin account
* Confirm whether this affects just production, or potentially staging + ops as well (the latter suggesting some possible external trigger)
And as always the [Triage dashboard]( https://dashboards.gitlab.net/d/RZmbBr7mk/gitlab-triage?orgId=1) is an excellent place to look.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment