Skip to content

Estimate database rate requirements

Context

This is a followup from gitlab#207147 (comment 310427052).

As we are working on switching the Container Registry API metadata write and read operations from the filesystem into a PostgreSQL database, we need to estimate the rate requirements. Results should be reviewed and discussed with the Database team.

Current API Request Rate

The average request rate of the Container Registry HTTP API will serve as a baseline to extrapolate the rate requirements for the upcoming metadata database. Once determined how many requests over a given time frame the API is serving, we can extrapolate how many SQL operations would those requests require.

Methodology

  • Per-second rate metrics collected twice (averaged) at 2020/06/19 18:00 UTC and 2020/06/25 09:00 UTC from Prometheus, looking at the previous 48 hours (business days only, for a worst-case estimate);

  • Per-minute, hourly, daily, monthly and yearly rates calculated by applying a multiplication factor to the per-second rate;

  • Metrics grouped by API handler (e.g. manifests, tags, etc.) and HTTP method;

  • Handler/method pairs that do not imply any database operations were ignored;

  • Rates rounded to the nearest integer, unless less than zero, in which case two decimal places were used;

  • Requests that ended with a 500 status code (Internal Server Error) were ignored. These are negligible (~0.03% of the eligible requests), and we have no way to know if they would imply any database operations (we do not know at which point they failed and why);

  • Requests that ended with a 401 status code (Unauthorized) were ignored (~16% of the eligible requests), as they are promptly rejected and do not require any database operations;

  • Requests that ended with a 404 status code (Not Found) were accounted for as successful requests. Similar to 500, we have no way to determine how many database operations would be required for these, but they are not negligible (~2% of the eligible requests). This is a worst-case estimate.

  • PromQL query:

    sort_desc(
      sum by(handler, method)(
        rate(
          registry_http_requests_total{
            cluster="gprd-gitlab-gke",
            namespace="gitlab",
            code!="500",
            code!="401"
          }[48h]
        )))

Results

Handler Method Description Per-second Per-minute Hourly Daily Weekly Monthly Yearly
Manifest GET Download manifest. 141 8,469 508,112 12,194,681 85,362,768 2,560,883,042 30,730,596,502
Blob GET Download blob. 74 4,411 264,664 6,351,934 44,463,536 1,333,906,094 16,006,873,132
Tags GET List tags. 52 3,133 187,982 4,511,557 31,580,898 947,426,931 11,369,123,175
Blob HEAD Check existence of blob. 21 1,282 76,914 1,845,927 12,921,488 387,644,649 4,651,735,785
Manifest HEAD Check existence of manifest. 8 473 28,404 681,687 4,771,808 143,154,255 1,717,851,056
Blob Upload PUT Complete blob upload. 5 295 17,712 425,091 2,975,638 89,269,140 1,071,229,676
Manifest PUT Upload manifest. 2 102 6,118 146,824 1,027,767 30,832,997 369,995,959
Manifest DELETE Delete manifest. 0.03 2 114 2,741 19,184 575,521 6,906,256
Tag DELETE Delete tag. 0.03 2 113 2,705 18,934 568,019 6,816,234
Total 303 18,169 1,090,131 26,163,146 183,142,022 5,494,260,648 65,931,127,774

Database Operations per API Request

To estimate the database operation rate, first, we need to determine the type (SELECT, INSERT, UPDATE, DELETE) and the number of database operations required to fulfil each API handler/method request.

Methodology

The worst-case scenario for each handler/method pair was assumed; in other words, it is assumed that we would need to perform the highest possible number of database operations for each request:

  • Even though some write requests may target one or more existing resources, in which case an INSERT to create those would not be needed, we assume that we will always need to create them;
  • Even though some read requests may target a non-existing resource, in which case the operation would not need to perform all possible database operations, we assume that all resources exist.

Results

Handler Method SELECT INSERT UPDATE DELETE Execution Flow
Manifest GET 3 0 0 0 1. Find repository (1 SELECT);
2. Find tag in repository (1 SELECT);
3. Find manifest associated with tag (1 SELECT).
Blob GET 2 0 0 0 1. Find repository (1 SELECT);
2. Check if blob is linked in repository (1 SELECT);
Tags GET 2 0 0 0 1. Find repository (1 SELECT);
2. Find repository tags (1 SELECT).
Blob HEAD 2 0 0 0 1. Find repository (1 SELECT);
2. Check if blob is linked in repository (1 SELECT);
Manifest HEAD 2 0 0 0 1. Find repository (1 SELECT);
2. Check if manifest is linked in repository (1 SELECT);
Blob Upload PUT 1 3 0 0 1. Create repository. Assuming it already exists (worst-case), the insert fails, and we need to find it (1 INSERT, 1 SELECT);
2. Create blob (1 INSERT);
3. Link blob in repository (1 INSERT).
Manifest PUT 1 4 0 0 1. Create repository. Assuming it already exists (worst-case), the insert fails, and we need to find it (1 INSERT, 1 SELECT);
2. Create blob (1 INSERT);
3. Link blob in repository (1 INSERT).
4. Tag manifest (1 INSERT).
Catalog GET 1 0 0 0 1. Find repositories (1 SELECT).
Manifest DELETE 1 0 0 1 1. Find target repository: 1 SELECT; 2. Unlink manifest from repository: 1 DELETE.
Tag DELETE 1 0 0 1 1. Find repository (1 SELECT);
2. Untag manifest (1 DELETE).

Estimated Database Rate

After collecting the current production API request rate and estimating the required database operations per request, we can finally estimate the database operation rate requirements.

Methodology

The API request rate was multiplied by the estimated number of database operations for each handler/method pair, across all ranges.

Results

Detailed Breakdown

Per-second
Handler Method SELECT INSERT UPDATE DELETE
Manifest GET 423 0 0 0
Blob GET 147 0 0 0
Tags GET 104 0 0 0
Blob HEAD 43 0 0 0
Manifest HEAD 16 0 0 0
Blob Upload PUT 5 15 0 0
Manifest PUT 2 7 0 0
Catalog GET 0 0 0 0
Manifest DELETE 0 0 0 0
Tag DELETE 0 0 0 0
Total 740 22 0 0
Per-minute
Handler Method SELECT INSERT UPDATE DELETE
Manifest GET 25,406 0 0 0
Blob GET 8,822 0 0 0
Tags GET 6,266 0 0 0
Blob HEAD 2,564 0 0 0
Manifest HEAD 947 0 0 0
Blob Upload PUT 295 886 0 0
Manifest PUT 102 408 0 0
Catalog GET 0 0 0 0
Manifest DELETE 2 0 0 2
Tag DELETE 2 0 0 2
Total 44,405 1,293 0 4
Hourly
Handler Method SELECT INSERT UPDATE DELETE
Manifest GET 1,524,335 0 0 0
Blob GET 529,328 0 0 0
Tags GET 375,963 0 0 0
Blob HEAD 153,827 0 0 0
Manifest HEAD 56,807 0 0 0
Blob Upload PUT 17,712 53,136 0 0
Manifest PUT 6,118 24,471 0 0
Catalog GET 0 0 0 0
Manifest DELETE 114 0 0 114
Tag DELETE 113 0 0 113
Total 2,664,317 77,607 0 227
Daily
Handler Method SELECT INSERT UPDATE DELETE
Manifest GET 36,584,043 0 0 0
Blob GET 12,703,868 0 0 0
Tags GET 9,023,114 0 0 0
Blob HEAD 3,691,854 0 0 0
Manifest HEAD 1,363,374 0 0 0
Blob Upload PUT 425,091 1,275,273 0 0
Manifest PUT 146,824 587,295 0 0
Catalog GET 0 0 0 0
Manifest DELETE 2,741 0 0 2,741
Tag DELETE 2,705 0 0 2,705
Total 63,943,613 1,862,569 0 5,445
Weekly
Handler Method SELECT INSERT UPDATE DELETE
Manifest GET 256,088,304 0 0 0
Blob GET 88,927,073 0 0 0
Tags GET 63,161,795 0 0 0
Blob HEAD 25,842,977 0 0 0
Manifest HEAD 9,543,617 0 0 0
Blob Upload PUT 2,975,638 8,926,914 0 0
Manifest PUT 1,027,767 4,111,066 0 0
Catalog GET 0 0 0 0
Manifest DELETE 19,184 0 0 19,184
Tag DELETE 18,934 0 0 18,934
Total 447,605,289 13,037,980 0 38,118
Monthly
Handler Method SELECT INSERT UPDATE DELETE
Manifest GET 7,682,649,125 0 0 0
Blob GET 2,667,812,189 0 0 0
Tags GET 1,894,853,862 0 0 0
Blob HEAD 775,289,298 0 0 0
Manifest HEAD 286,308,509 0 0 0
Blob Upload PUT 89,269,140 267,807,419 0 0
Manifest PUT 30,832,997 123,331,986 0 0
Catalog GET 0 0 0 0
Manifest DELETE 575,521 0 0 575,521
Tag DELETE 568,019 0 0 568,019
Total 13,428,158,661 391,139,405 0 1,143,541
Yearly
Handler Method SELECT INSERT UPDATE DELETE
Manifest GET 92,191,789,505 0 0 0
Blob GET 32,013,746,263 0 0 0
Tags GET 22,738,246,350 0 0 0
Blob HEAD 9,303,471,570 0 0 0
Manifest HEAD 3,435,702,113 0 0 0
Blob Upload PUT 1,071,229,676 3,213,689,027 0 0
Manifest PUT 369,995,959 1,479,983,836 0 0
Catalog GET 0 0 0 0
Manifest DELETE 6,906,256 0 0 6,906,256
Tag DELETE 6,816,234 0 0 6,816,234
Total 161,137,903,926 4,693,672,863 0 13,722,490

Aggregated Total

Period SELECT INSERT UPDATE DELETE Total
Per-second 740 22 0 0 762
Per-minute 44,405 1,293 0 4 45,703
Hourly 2,664,317 77,607 0 227 2,742,151
Daily 63,943,613 1,862,569 0 5,445 65,811,627
Weekly 447,605,289 13,037,980 0 38,118 460,681,387
Monthly 13,428,158,661 391,139,405 0 1,143,541 13,820,441,607
Yearly 161,137,903,926 4,693,672,863 0 13,722,490 165,845,299,279

Notes

  • We only accounted for API requests in this estimate. Apart from the API, the garbage collector will also need to talk with the database. Although it's difficult to quantify the rate arising from that interaction at this point, it should be negligible when compared with the API requests. Given this is a relatively pessimistic worst-case estimate, it should be more than enough to account for the requirements of any other background job.

  • The writes workload is very light when compared to reads. We thought we could be missing something here, but results were consistent across runs, and it matches what we can see in the Grafana dashboard. This may show how much users rely on the registry to store base images that are heavily reused (read) across CI runs.

  • Although the DELETE load is low at the moment, we expect this to increase substantially once the tag expiration policy feature becomes available for all existing projects on GitLab.com (gitlab#196124 (closed)).

Edited by João Pereira