Draft: Add upvotes_count to merge_requests table
What does this MR do?
Related to #263365
This MR adds a new column upvotes_count
to the merge_requests table. This will be used to perform sorting by popularity on Merge Requests in Basic and Advanced Search (future MRs will implement the logic). Additionally the new column can be used to improve performance on the merge requests list (which sorts by popularity doing a join currently).
Add a temporary index in a post migration to improve select and update SQL performance. This will be removed in the next milestone per #340299 (closed)
The migration is done in post migration using a background migration to perform the backfill of data.
Screenshots or Screencasts (strongly suggested)
How to setup and validate locally (strongly suggested)
Database
Background Migration Details:
3466338 items to update
batch size = 5000
3466338 / 5000 = 694 batches
Estimated times per batch:
- 12ms for select statement with 5000 items (see linked explain plan)
- 6ms for update statement with 500 items (see linked explain plan)
Total: ~0.018 sec per batch
2 mins delay per batch (safe for the given total time per batch)
694 batches * 2 min per batch = 1388 mins to run all the scheduled jobs
Select statement plan
Plan: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6310/commands/21505
SELECT
award_emoji.*
FROM
award_emoji
WHERE
award_emoji.awardable_type = 'MergeRequest'
AND award_emoji.name = 'thumbsup'
ORDER BY
award_emoji.id
LIMIT 5000 offset 1
Update statement plan
Note: I used the ids (min 8, max 21623) from the select above in a database-lab clone to get the min/max ids
Plan: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/6448/commands/22097
WITH batched_relation AS MATERIALIZED (
SELECT
award_emoji.awardable_id
FROM
award_emoji
WHERE
award_emoji.id BETWEEN 21425 AND 21925
AND award_emoji.name = 'thumbsup'
AND award_emoji.awardable_type = 'MergeRequest'
AND award_emoji.id >= 21425
AND award_emoji.id < 21925
LIMIT 500)
UPDATE
merge_requests
SET
upvotes_count = sub_q.count_all
FROM (
SELECT
COUNT(*) AS count_all,
e.awardable_id AS merge_request_id
FROM
batched_relation AS e
GROUP BY
merge_request_id) AS sub_q
WHERE
sub_q.merge_request_id = merge_requests.id;
up
be rails db:migrate:up VERSION=20210802144035
== 20210802144035 AddUpvotesCountToMergeRequests: migrating ===================
-- add_column(:merge_requests, :upvotes_count, :integer, {:default=>0, :null=>false})
-> 0.0040s
== 20210802144035 AddUpvotesCountToMergeRequests: migrated (0.0129s) ==========
be rails db:migrate:up VERSION=20210825154031
== 20210825154031 AddTemporaryIndexOnAwardEmojiForMergeRequestsWithThumbsups: migrating
-- transaction_open?()
-> 0.0000s
-- index_exists?(:award_emoji, [:id, :awardable_id], {:where=>"awardable_type = 'MergeRequest' AND name = 'thumbsup'", :name=>"tmp_idx_award_emoji_on_merge_requests_with_thumbsup", :algorithm=>:concurrently})
-> 0.0029s
-- execute("SET statement_timeout TO 0")
-> 0.0006s
-- add_index(:award_emoji, [:id, :awardable_id], {:where=>"awardable_type = 'MergeRequest' AND name = 'thumbsup'", :name=>"tmp_idx_award_emoji_on_merge_requests_with_thumbsup", :algorithm=>:concurrently})
-> 0.0051s
-- execute("RESET statement_timeout")
-> 0.0006s
== 20210825154031 AddTemporaryIndexOnAwardEmojiForMergeRequestsWithThumbsups: migrated (0.0114s)
➜ be rails db:migrate:up VERSION=20210826151803
== 20210826151803 BackfillMergeRequestsUpvotesCount: migrating ================
-- Scheduled 1 BackfillUpvotesCountOnMergeRequests jobs with a maximum of 5000 records per batch and an interval of 120 seconds.
The migration is expected to take at least 120 seconds. Expect all jobs to have completed after 2021-09-01 20:28:33 UTC."
== 20210826151803 BackfillMergeRequestsUpvotesCount: migrated (0.0279s) =======
down
be rails db:migrate:down VERSION=20210802144035
== 20210802144035 AddUpvotesCountToMergeRequests: reverting ===================
-- remove_column(:merge_requests, :upvotes_count)
-> 0.0037s
== 20210802144035 AddUpvotesCountToMergeRequests: reverted (0.0037s) ==========
bundle exec rails db:migrate:down VERSION=20210802151803
== 20210802151803 BackfillMergeRequestsUpvotesCount: reverting ================
== 20210802151803 BackfillMergeRequestsUpvotesCount: reverted (0.0000s) =======
be rails db:migrate:down VERSION=20210825154031
== 20210825154031 AddTemporaryIndexOnAwardEmojiForMergeRequestsWithThumbsups: reverting
-- transaction_open?()
-> 0.0000s
-- indexes(:award_emoji)
-> 0.0042s
-- execute("SET statement_timeout TO 0")
-> 0.0006s
-- remove_index(:award_emoji, {:algorithm=>:concurrently, :name=>"tmp_idx_award_emoji_on_merge_requests_with_thumbsup"})
-> 0.0036s
-- execute("RESET statement_timeout")
-> 0.0006s
== 20210825154031 AddTemporaryIndexOnAwardEmojiForMergeRequestsWithThumbsups: reverted (0.0156s)
❯ be rails db:migrate:down VERSION=20210826151803
== 20210826151803 BackfillMergeRequestsUpvotesCount: reverting ================
== 20210826151803 BackfillMergeRequestsUpvotesCount: reverted (0.0000s) =======
Timings from database-lab
Setup in database-lab
exec ALTER TABLE merge_requests ADD COLUMN upvotes_count INTEGER NOT NULL DEFAULT 0
exec CREATE INDEX CONCURRENTLY tmp_idx_award_emoji_on_merge_requests_with_thumbsup ON award_emoji (id, awardable_id) WHERE awardable_type = 'MergeRequest' AND name = 'thumbsup';
exec UPDATE merge_requests SET upvotes_count = floor(random() * (1000+1))::int WHERE merge_requests.id IN (SELECT merge_requests.id FROM merge_requests WHERE target_project_id = 278964 AND (random() < 0.01) LIMIT 1500);
Note: I tried adding an index for this in database-lab, but the index wasn't used by the planner so I decided not to include it.
exec CREATE INDEX CONCURRENTLY index_merge_requests_on_target_project_id_and_upvotes_count ON merge_requests(target_project_id, upvotes_count);
Before
Explain: https://explain.depesz.com/s/GuOC
SQL
SELECT
"merge_requests".*
FROM
"merge_requests"
LEFT OUTER JOIN "award_emoji" ON "award_emoji"."awardable_id" = "merge_requests"."id"
AND "award_emoji"."awardable_type" = 'MergeRequest'
AND "award_emoji"."name" = 'thumbsup'
WHERE
"merge_requests"."target_project_id" = 278964
AND ("merge_requests"."state_id" IN (1))
GROUP BY
"merge_requests"."id"
ORDER BY
COUNT(award_emoji.id) DESC,
"merge_requests"."id" DESC
LIMIT 20 OFFSET 0
Cold cache
Time: 1.632 s
- planning: 6.462 ms
- execution: 1.626 s
- I/O read: 2.943 s
- I/O write: 0.000 ms
Shared buffers:
- hits: 3919 (~30.60 MiB) from the buffer pool
- reads: 2922 (~22.80 MiB) from the OS file cache, including disk I/O
- dirtied: 400 (~3.10 MiB)
- writes: 0
Warm cache
Time: 25.613 ms
- planning: 1.076 ms
- execution: 24.537 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 6841 (~53.40 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
After
Explain: https://explain.depesz.com/s/7Upz
SQL
SELECT
"merge_requests".*
FROM
"merge_requests"
WHERE
"merge_requests"."target_project_id" = 278964
AND ("merge_requests"."state_id" IN (1))
GROUP BY
"merge_requests"."id"
ORDER BY
COUNT(upvotes_count) DESC,
"merge_requests"."id" DESC
LIMIT 20 OFFSET 0;
Cold cache
Time: 50.547 ms
- planning: 6.406 ms
- execution: 44.141 ms
- I/O read: 32.190 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 1031 (~8.10 MiB) from the buffer pool
- reads: 614 (~4.80 MiB) from the OS file cache, including disk I/O
- dirtied: 134 (~1.00 MiB)
- writes: 0
Warm cache
Time: 10.663 ms
- planning: 0.613 ms
- execution: 10.050 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 1440 (~11.30 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Security
Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team