Skip to content

Issue creation rate limit increments without expiring in Redis

Summary

Issue creation rate limit never expires for one user who creates up to a few issues per minute, causing them to hit the limit despite actually being well below the limit.

Steps to reproduce

I'm not able to reproduce this, but it would be reproducible if we could set TTY to -1 on for issue creation rate limit.

Witnessed firsthand on customer's production system. (See notes & observations)

Customer support ticket opened and call scheduled to find a path to resolution for an ongoing problem where the rate limit count for issue creation was incrementing but not expiring.

The issue only affected one service account user that creates GitLab issues via the API. (hereby referred to as user:1)

/opt/gitlab/embedded/bin/redis-cli -s /var/opt/gitlab/redis/redis.socket GET application_rate_limiter:issues_create:user:{|USER ID 1|}:issues_create

returns over 2550. (current limit set to 2700)

Checking TTL for rate limiter showed -1, indicating that "key exists but has no associated expire".

/opt/gitlab/embedded/bin/redis-cli -s /var/opt/gitlab/redis/redis.socket ttl application_rate_limiter:issues_create:user:1:issues_create

Manually expiring the rate limit for application_rate_limiter:issues_create:user:1:issues_create reset the counter to (nil)

/opt/gitlab/embedded/bin/redis-cli -s /var/opt/gitlab/redis/redis.socket expire application_rate_limiter:issues_create:user:1:issues_create 60

Using the redis-cli to check TTL of this key living at -1. After performing an EXPIRE on the same key, 60 seconds later the GET was no longer returning the high value we were seeing before but rather (nil).

After expiring the key manually, the limit now expires automatically every 60 seconds as expected.

What is the current bug behavior?

Issue creation rate limits are consistently being hit because the application_rate_limiter:issues_create:user:1:issues_create counter never expires.

In reality, this user is making only a few requests per project per minute.

The issue creation rate limit is incremented by one each time there's a request from user:1, but fails to expire after 60 seconds.

With no expiration, this results in hitting the limit of 300 after 300 requests, with subsequent requests resulting in 429 HTTP error code in api_json.log, auth.log and nginx/gitlab_access.log .

Raising the limit temporarily solves the problem until the Xth request comes in.

What is the expected correct behavior?

Issue creation rate limit count expires after 60 seconds for all users/requests.

Relevant logs and/or screenshots

Running /opt/gitlab/embedded/bin/redis-cli -s /var/opt/gitlab/redis/redis.socket monitor | grep rate_limit outputs:

08:41:14 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
08:51:12 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
08:53:50 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
09:00:31 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
09:00:41 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
09:02:59 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
09:05:15 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
09:07:24 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
09:09:33 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
09:36:46 AM  "incr" "application_rate_limiter:issues_create:project:1:user:2"
09:36:46 AM  "expire" "application_rate_limiter:issues_create:project:1:user:2" "60"
09:39:59 AM  "incr" "application_rate_limiter:issues_create:project:2:user:3"
09:39:59 AM  "expire" "application_rate_limiter:issues_create:project:2:user:3" "60"
09:46:46 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
09:49:21 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
09:56:25 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
10:04:12 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
10:12:03 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
10:18:50 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
10:25:09 AM  "incr" "application_rate_limiter:issues_create:project:3:user:4"
10:25:09 AM  "expire" "application_rate_limiter:issues_create:project:3:user:4" "60"
10:28:23 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
10:32:31 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
11:13:24 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
11:17:43 AM  "incr" "application_rate_limiter:issues_create:user:1:issues_create"
11:21:52 AM  "incr" "application_rate_limiter:issues_create:project:4:user:5"
11:21:52 AM  "expire" "application_rate_limiter:issues_create:project:4:user:5" "60"

Note that application_rate_limiter:issues_create:user:1:issues_create does not expire.

Querying redis with the following command returns ~2600 requests, limit set to ~2700 (default 300)

/opt/gitlab/embedded/bin/redis-cli -s /var/opt/gitlab/redis/redis.socket GET application_rate_limiter:issues_create:user:1:issues_create

Checking TTL for this rate limit returned -1.

/opt/gitlab/embedded/bin/redis-cli -s /var/opt/gitlab/redis/redis.socket ttl application_rate_limiter:issues_create:user:1:issues_create

Used the following command to manually expired this rate limit counter:

/opt/gitlab/embedded/bin/redis-cli -s /var/opt/gitlab/redis/redis.socket ttl application_rate_limiter:issues_create:user:1:issues_create
(integer) -1

Manually expiring the rate limit with this command seems to have have made it start working as intended AND changed the TTL value from -1 to -2. From that point on, the 60-second auto-expire appears to be working as expected.

GitLab environment info

GitLab 13.1.5

Single-node Omnibus (Self-managed)

Possible fixes

  • Include more logic in GitLab to detect and fix Application Rate Limits that have no expiration set.
  • Include expiration of all application rate limit keys as part of gitlab-rake cache:clear as a manual workaround rather instead of manually expiring the limit in Redis

Other proposals

  • "troubleshooting rate limit" instructions and commands to Redis troubleshooting guide
  • add Redis rate limits to GitLab monitoring stats (grafana)

Relevant code

https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/application_rate_limiter.rb#L72-76

@cmiskell

Edited by Greg Myers