Skip to content

Optimize the non-distinct ci_build counts

What does this MR do?

Optimize query produced by https://gitlab.com/gitlab-org/gitlab/-/blob/908902d1d4f6fbcd7780150c780837fd4fa8b301/lib/gitlab/usage_data.rb#L48

query with batching
SELECT
    COUNT("ci_builds"."id")
FROM
    "ci_builds"
WHERE
    "ci_builds"."type" = 'Ci::Build'
    AND "ci_builds"."id" BETWEEN 810000 AND 811250;

explain data from database-lab

  • id between 810_000 AND 811_250

Before

Observations

  1. Filter being applied on type
Index Cond: ((ci_builds.id >= 810000) AND (ci_builds.id <= 811250))
Filter: ((ci_builds.type)::text = 'Ci::Build'::text)
  • combat this with adding index
CREATE INDEX CONCURRENTLY index_ci_builds_on_id_ci_builds ON ci_builds USING btree (id)
WHERE
    TYPE = 'Ci::Build';
  • explain now shows index only condition hit for that part *
  1. MAX/MIN calculations
  • Query
  • Explain results -
    • After adding indexes for count itself -

Plan

  • Add these indexes
CREATE INDEX CONCURRENTLY index_ci_builds_on_id_ci_builds ON ci_builds USING btree (id)
WHERE
    TYPE = 'Ci::Build';

After only index conditions are hit

Timing

After the index for batch counting takes xxx seconds super pessimistic using database-lab

  • 5.5 million users,
  • with 1_250 batch sizes
  • 5.5M/1_250 = 4_400 loops
  • Time: < xxx ( cold cache )

Migration output

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Related to #208884 (closed)

Edited by Alper Akgun

Merge request reports