Skip to content

Accumulate usage in redis and flush to database to prevent database update contention

What does this MR do and why?

Solves a problem where 90% of CI activity hits the same namespace record, causing database lock contention and saturating Sidekiq workers with long-running connections.

This change introduces a Redis-based batching system for CI minutes usage tracking to improve database performance. Instead of immediately writing every usage update to the database (which can cause contention under high load), the system now temporarily stores usage increments in Redis and periodically flushes them to the database in batches.

The implementation adds a new Redis storage layer that accumulates CI minutes usage data, a background worker that runs every 15 minutes to move the accumulated data from Redis to the database, and fallback logic that writes directly to the database if Redis is unavailable. The existing usage tracking methods are updated to first try storing in Redis, and the total usage calculations now combine both the database values and any pending Redis values to provide accurate real-time totals.

This approach reduces database write frequency.

Redis SQL Details

Instead of individual UPDATE statements for each CI job completion:

UPDATE ci_namespace_monthly_usages SET amount_used = amount_used + 5 WHERE namespace_id = 123;
UPDATE ci_namespace_monthly_usages SET amount_used = amount_used + 8 WHERE namespace_id = 123;
UPDATE ci_namespace_monthly_usages SET amount_used = amount_used + 3 WHERE namespace_id = 123;

We batch the usage and flush periodically:

 UPDATE ci_namespace_monthly_usages SET amount_used = amount_used + 16 WHERE namespace_id = 123;

Implementation uses Redis for temporary storage.

The data is stored at a key with {"amount_used" => 45.5, "shared_runners_duration" => 23.2}.

For batching we use:

HINCRBYFLOAT minutes_batch:project:123 "shared_runners_duration" 5.2
HINCRBYFLOAT minutes_batch:project:123 "amount_used" 10.5

to create a hash.

References

We currently run 15 workers per second which update usage on .com. Redis can handle 10K-50K RPS updates per second to a key. That means redis can sufficiently handle todays load even with updates going to a single key. This solution will meet the current and future scaling needs for some time and there is no reason to store with more fidelity in redis. (If we did decided to fanout to more keys later redis offers high RPS numbers (100K-1M+).

https://log.gprd.gitlab.net/app/r/s/EdAMB

Screenshot_2025-06-12_at_11.56.21_AM

https://redis.io/docs/latest/operate/oss_and_stack/management/optimization/benchmarks/

concern: memory space of this many keys.

Related to #490968

Edited by Allison Browne

Merge request reports

Loading