Skip to content

Draft: Add upvotes_count to merge_requests table

What does this MR do?

Related to #263365

This MR adds a new column upvotes_count to the merge_requests table. This will be used to perform sorting by popularity on Merge Requests in Basic and Advanced Search (future MRs will implement the logic). Additionally the new column can be used to improve performance on the merge requests list (which sorts by popularity doing a join currently).

Add a temporary index in a post migration to improve select and update SQL performance. This will be removed in the next milestone per #340299 (closed)

The migration is done in post migration using a background migration to perform the backfill of data.

Screenshots or Screencasts (strongly suggested)

How to setup and validate locally (strongly suggested)

Database

Background Migration Details:

3466338 items to update
batch size = 5000
3466338 / 5000 = 694 batches

Estimated times per batch:
- 12ms for select statement with 5000 items (see linked explain plan)
- 6ms for update statement with 500 items (see linked explain plan)
Total: ~0.018 sec per batch

2 mins delay per batch (safe for the given total time per batch)

694 batches * 2 min per batch = 1388 mins to run all the scheduled jobs

Select statement plan

Plan: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6310/commands/21505

SELECT
    award_emoji.*
FROM
    award_emoji
WHERE
    award_emoji.awardable_type = 'MergeRequest'
    AND award_emoji.name = 'thumbsup'
ORDER BY
    award_emoji.id
LIMIT 5000 offset 1

Update statement plan

Note: I used the ids (min 8, max 21623) from the select above in a database-lab clone to get the min/max ids

Plan: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6448/commands/22097

WITH batched_relation AS MATERIALIZED (
    SELECT
        award_emoji.awardable_id
    FROM
        award_emoji
    WHERE
        award_emoji.id BETWEEN 21425 AND 21925
        AND award_emoji.name = 'thumbsup'
        AND award_emoji.awardable_type = 'MergeRequest'
        AND award_emoji.id >= 21425
        AND award_emoji.id < 21925
    LIMIT 500)
UPDATE
    merge_requests
SET
    upvotes_count = sub_q.count_all
FROM (
    SELECT
        COUNT(*) AS count_all,
        e.awardable_id AS merge_request_id
    FROM
        batched_relation AS e
    GROUP BY
        merge_request_id) AS sub_q
WHERE
    sub_q.merge_request_id = merge_requests.id;

up

be rails db:migrate:up VERSION=20210802144035
== 20210802144035 AddUpvotesCountToMergeRequests: migrating ===================
-- add_column(:merge_requests, :upvotes_count, :integer, {:default=>0, :null=>false})
   -> 0.0040s
== 20210802144035 AddUpvotesCountToMergeRequests: migrated (0.0129s) ==========
be rails db:migrate:up VERSION=20210825154031
== 20210825154031 AddTemporaryIndexOnAwardEmojiForMergeRequestsWithThumbsups: migrating
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:award_emoji, [:id, :awardable_id], {:where=>"awardable_type = 'MergeRequest' AND name = 'thumbsup'", :name=>"tmp_idx_award_emoji_on_merge_requests_with_thumbsup", :algorithm=>:concurrently})
   -> 0.0029s
-- execute("SET statement_timeout TO 0")
   -> 0.0006s
-- add_index(:award_emoji, [:id, :awardable_id], {:where=>"awardable_type = 'MergeRequest' AND name = 'thumbsup'", :name=>"tmp_idx_award_emoji_on_merge_requests_with_thumbsup", :algorithm=>:concurrently})
   -> 0.0051s
-- execute("RESET statement_timeout")
   -> 0.0006s
== 20210825154031 AddTemporaryIndexOnAwardEmojiForMergeRequestsWithThumbsups: migrated (0.0114s)
➜ be rails db:migrate:up VERSION=20210826151803
== 20210826151803 BackfillMergeRequestsUpvotesCount: migrating ================
-- Scheduled 1 BackfillUpvotesCountOnMergeRequests jobs with a maximum of 5000 records per batch and an interval of 120 seconds.

The migration is expected to take at least 120 seconds. Expect all jobs to have completed after 2021-09-01 20:28:33 UTC."
== 20210826151803 BackfillMergeRequestsUpvotesCount: migrated (0.0279s) =======

down

be rails db:migrate:down VERSION=20210802144035
== 20210802144035 AddUpvotesCountToMergeRequests: reverting ===================
-- remove_column(:merge_requests, :upvotes_count)
   -> 0.0037s
== 20210802144035 AddUpvotesCountToMergeRequests: reverted (0.0037s) ==========
bundle exec rails db:migrate:down VERSION=20210802151803
== 20210802151803 BackfillMergeRequestsUpvotesCount: reverting ================
== 20210802151803 BackfillMergeRequestsUpvotesCount: reverted (0.0000s) =======
be rails db:migrate:down VERSION=20210825154031
== 20210825154031 AddTemporaryIndexOnAwardEmojiForMergeRequestsWithThumbsups: reverting
-- transaction_open?()
   -> 0.0000s
-- indexes(:award_emoji)
   -> 0.0042s
-- execute("SET statement_timeout TO 0")
   -> 0.0006s
-- remove_index(:award_emoji, {:algorithm=>:concurrently, :name=>"tmp_idx_award_emoji_on_merge_requests_with_thumbsup"})
   -> 0.0036s
-- execute("RESET statement_timeout")
   -> 0.0006s
== 20210825154031 AddTemporaryIndexOnAwardEmojiForMergeRequestsWithThumbsups: reverted (0.0156s)
❯ be rails db:migrate:down VERSION=20210826151803
== 20210826151803 BackfillMergeRequestsUpvotesCount: reverting ================
== 20210826151803 BackfillMergeRequestsUpvotesCount: reverted (0.0000s) =======

Timings from database-lab

image

image

Setup in database-lab

exec ALTER TABLE merge_requests ADD COLUMN upvotes_count INTEGER NOT NULL DEFAULT 0

exec CREATE INDEX CONCURRENTLY tmp_idx_award_emoji_on_merge_requests_with_thumbsup ON award_emoji (id, awardable_id) WHERE awardable_type = 'MergeRequest' AND name = 'thumbsup';

exec UPDATE merge_requests SET upvotes_count = floor(random() * (1000+1))::int WHERE merge_requests.id IN (SELECT merge_requests.id FROM merge_requests WHERE target_project_id = 278964 AND (random() < 0.01) LIMIT 1500);

Note: I tried adding an index for this in database-lab, but the index wasn't used by the planner so I decided not to include it.

exec CREATE INDEX CONCURRENTLY index_merge_requests_on_target_project_id_and_upvotes_count ON merge_requests(target_project_id, upvotes_count);

Before

Explain: https://explain.depesz.com/s/GuOC

SQL
SELECT
    "merge_requests".*
FROM
    "merge_requests"
    LEFT OUTER JOIN "award_emoji" ON "award_emoji"."awardable_id" = "merge_requests"."id"
    AND "award_emoji"."awardable_type" = 'MergeRequest'
    AND "award_emoji"."name" = 'thumbsup'
WHERE
    "merge_requests"."target_project_id" = 278964
    AND ("merge_requests"."state_id" IN (1))
GROUP BY
    "merge_requests"."id"
ORDER BY
    COUNT(award_emoji.id) DESC,
    "merge_requests"."id" DESC
LIMIT 20 OFFSET 0

Cold cache

Time: 1.632 s
  - planning: 6.462 ms
  - execution: 1.626 s
    - I/O read: 2.943 s
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 3919 (~30.60 MiB) from the buffer pool
  - reads: 2922 (~22.80 MiB) from the OS file cache, including disk I/O
  - dirtied: 400 (~3.10 MiB)
  - writes: 0

Warm cache

Time: 25.613 ms
  - planning: 1.076 ms
  - execution: 24.537 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 6841 (~53.40 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

After

Explain: https://explain.depesz.com/s/7Upz

SQL
SELECT
    "merge_requests".*
FROM
    "merge_requests"
WHERE
    "merge_requests"."target_project_id" = 278964
    AND ("merge_requests"."state_id" IN (1))
GROUP BY
    "merge_requests"."id"
ORDER BY
    COUNT(upvotes_count) DESC,
    "merge_requests"."id" DESC
LIMIT 20 OFFSET 0;

Cold cache

Time: 50.547 ms
  - planning: 6.406 ms
  - execution: 44.141 ms
    - I/O read: 32.190 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 1031 (~8.10 MiB) from the buffer pool
  - reads: 614 (~4.80 MiB) from the OS file cache, including disk I/O
  - dirtied: 134 (~1.00 MiB)
  - writes: 0

Warm cache

Time: 10.663 ms
  - planning: 0.613 ms
  - execution: 10.050 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 1440 (~11.30 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Terri Chu

Merge request reports