Skip to content

Fix count_distinct_merge_request_id_from_scan_finding_approval_merge_request_rules metric

What does this MR do and why?

In GitLab::Database::BatchCounter we have unwanted_configuration? method that is checking if parameters configured for given Instrumentation class are valid, in other case value for given instrumentation class will be invalid: -1:

def unwanted_configuration?(finish, batch_size, start)
  (@operation == :count && batch_size <= MIN_REQUIRED_BATCH_SIZE) ||
    (@operation == :sum && batch_size < DEFAULT_SUM_BATCH_SIZE) ||
    (finish - start) / batch_size >= MAX_ALLOWED_LOOPS ||
    start >= finish
end

Let's see if current parameters are violating this check:

MIN_REQUIRED_BATCH_SIZE = 1_250
DEFAULT_SUM_BATCH_SIZE = 1_000
MAX_ALLOWED_LOOPS = 10_000

@operation = :count
batch_size = 10_000
start = ::ApprovalMergeRequestRule.minimum(:merge_request_id) # => 1
finish = ::ApprovalMergeRequestRule.maximum(:merge_request_id) # => ~190000000

(@operation == :count && batch_size <= MIN_REQUIRED_BATCH_SIZE) # => false
(@operation == :sum && batch_size < DEFAULT_SUM_BATCH_SIZE) # => false
start >= finish # => false
(finish - start) / batch_size >= MAX_ALLOWED_LOOPS # => (190_000_000 - 0) / 10_000 >= 10_000 => true 

To fix this we need to change batch_size parameter to make sure the last condition is not met, in this MR we are changing this to 50_000.

explain SELECT COUNT(DISTINCT "approval_merge_request_rules"."merge_request_id") FROM "approval_merge_request_rules" WHERE "approval_merge_request_rules"."report_type" = 4 AND "approval_merge_request_rules"."merge_request_id" BETWEEN 188900000 AND 188950000;

With https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/13333/commands/46799 we have checked that it performs without any issues.

We need to check if this will perform well for monthly metric as well:

explain SELECT COUNT(DISTINCT "approval_merge_request_rules"."merge_request_id") FROM "approval_merge_request_rules" WHERE "approval_merge_request_rules"."report_type" = 4 AND "approval_merge_request_rules"."merge_request_id" BETWEEN 188100000 AND 188150000 AND "approval_merge_request_rules"."created_at" BETWEEN '2022-10-16 01:10:49.374014' AND '2022-11-13 01:10:49.374149';

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/13333/commands/46800

Although it performs well we need to add index there to improve the query.

exec CREATE INDEX scan_finding_approval_mr_rule_index_merge_request_id_and_created_at ON approval_merge_request_rules USING btree (merge_request_id, created_at) WHERE (report_type = 4);

The query has been executed. Duration: 3.922 min (edited) 

After this same query performs flawlessly: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/13333/commands/46802

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #377717 (closed)

Merge request reports