Add circuit breaker logic to control the number of outbound requests to Topology Service (!218443) · Merge requests · GitLab.org / GitLab

What does this MR do and why?

The MR implements a circuit breaker logic to make sure that at any given time, we don't have more than N active requests to the topology service from rails.

The limit of the concurrent requests is configurable as an application setting and can be increased or decreased based on the Cell size and along with the number of connection pool available in the cells.

The primary reason for the circuit breaker is that, as every request to claim a resource is done inside a DB transaction, we want to make sure a bug in the TS or claiming framework doesn't end up taking all the available connection pool for the application, making the application choke due to that.

I added the DEFAULT_LIMIT as 200, which is well under what we can currently support(290): gitlab-com/gl-infra/tenant-scale/cells-infrastructure/team#488 (comment 2773726223)

References

How to set up and validate locally

Configure GDK as a Cell
In the rails console, enable the claiming and ts limit flags by:

Feature.enable(:cells_unique_claims)

Feature.enable(:topology_service_concurrency_limit)

Update the concurrency request to 1: Gitlab::CurrentSettings.update!(topology_service_concurrency_limit: 1) (since TS is local and returns very fast, reducing to the minimum value to mimic the error).
Use the below k6 script to generate load

k6 run ./k6.js

Warning

The script a lot of groups in the GDK

k6.js

import http from 'k6/http';
import { check } from 'k6';

export let options = {
  stages: [
    { duration: '10s', target: 20 }, // Ramp up to 20 concurrent users
    { duration: '20s', target: 20 }, // Stay at 20 concurrent users
    { duration: '5s', target: 0 },   // Ramp down to 0
  ],
};

export default function () {
  const token = 'YOUR_API_TOKEN'; // need to have api permission to allow creating groups
  const groupName = `test-group-${Date.now()}-${Math.random()}`;

  const payload = JSON.stringify({
    name: groupName,
    path: groupName,
    visibility: 'private',
    description: 'Test group for concurrency limit'
  });

  const params = {
    headers: {
      'PRIVATE-TOKEN': token,
      'Content-Type': 'application/json',
    },
  };

  const response = http.post('http://gdk.test:3000/api/v4/groups', payload, params);

  check(response, {
    'status is 201 or 500': (r) => r.status === 201 || r.status === 500,
    'status is 201 (success)': (r) => r.status === 201,
    'status is 500 (rejected)': (r) => r.status === 500,
  });
}

Verify the logs are flooded with: GRPC::ResourceExhausted error with:

cat log/development.log | rg ResourceExhausted

GRPC::ResourceExhausted (8:Topology Service concurrency limit exceeded for gitlab.cells.topology_service.claims.v1.ClaimService/BeginUpdate):
GRPC::ResourceExhausted (8:Topology Service concurrency limit exceeded for gitlab.cells.topology_service.claims.v1.ClaimService/BeginUpdate):
GRPC::ResourceExhausted (8:Topology Service concurrency limit exceeded for gitlab.cells.topology_service.claims.v1.ClaimService/BeginUpdate):
GRPC::ResourceExhausted (8:Topology Service concurrency limit exceeded for gitlab.cells.topology_service.claims.v1.ClaimService/BeginUpdate):
GRPC::ResourceExhausted (8:Topology Service concurrency limit exceeded for gitlab.cells.topology_service.claims.v1.ClaimService/BeginUpdate):
GRPC::ResourceExhausted (8:Topology Service concurrency limit exceeded for gitlab.cells.topology_service.claims.v1.ClaimService/BeginUpdate):

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited Jan 12, 2026 by Tarun Khandelwal

Add circuit breaker logic to control the number of outbound requests to Topology Service

What does this MR do and why?

References

How to set up and validate locally

MR acceptance checklist

Merge request reports