Skip to content

WIP: Resolve "Optimize project counters related to the ::Project"

What does this MR do?

Optimize queries produced by:

protected_branches: distinct_count(::Project.with_protected_branches.where(time_period), :creator_id)
query with time constraint and batching
SELECT
    COUNT(DISTINCT "projects"."creator_id")
FROM
    "projects"
    INNER JOIN "protected_branches" ON "protected_branches"."project_id" = "projects"."id"
WHERE
    "projects"."created_at" BETWEEN '2020-02-10 16:04:08.028772'
    AND '2020-03-09 16:04:08.028850'
    AND "projects"."creator_id" BETWEEN 810000 AND 811250;
query with no time constraint and batching
SELECT
    COUNT(DISTINCT "projects"."creator_id")
FROM
    "projects"
    INNER JOIN "protected_branches" ON "protected_branches"."project_id" = "projects"."id"
WHERE
    "projects"."creator_id" BETWEEN 810000 AND 811250;

explain data from database-lab

  • creator_id between 810_000 AND 811_250

Before

Observations

  1. filter applied when time condition with creator_id was performed (no index)
Index Cond: ((projects.creator_id >= 810000) AND (projects.creator_id <= 811250))
Filter: ((projects.created_at >= '2020-02-10 16:04:08.028772+00'::timestamp with time zone) AND (projects.created_at <= '2020-03-09 16:04:08.02885+00'::timestamp with time zone))
  • combat this with adding index
  • explain now shows index only condition hit(improvement)
    • time constraint -
    • no time constraint -
  1. MAX/MIN calculations
query
SELECT
    MAX("projects"."creator_id")
FROM
    "projects"
    INNER JOIN "protected_branches" ON "protected_branches"."project_id" = "projects"."id"
WHERE
    "projects"."created_at" BETWEEN '2020-02-13 14:57:26.678272'
    AND '2020-03-12 14:57:26.678362'
  • Explain results(within thresholds: xxx) -

Plan

  • Add these indexes

After only index conditions are hit

  • with time constraint -
  • no time constraint -

Timing

After the index for batch counting takes xxx seconds pessimistic

  • 5.5 million users,
  • with 1_250 batch sizes
  • 5.5M/1_250 = 4_400 loops
  • Time: < xxx ( cold cache with no time constraint )

Migration output

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Related to #208887 (closed)

Edited by Doug Stull

Merge request reports