Add issues.upvotes_count column
What does this MR do?
This MR introduces and backfills a new column issues.upvotes_count
. This column is required to efficiently sort issues search results by popularity (upvotes).
Migration output
click here
❯ bin/rake db:migrate
== 20210701111627 AddUpvotesCountToIssues: migrating ==========================
-- add_column(:issues, :upvotes_count, :integer, {:default=>0, :null=>false})
-> 0.0010s
== 20210701111627 AddUpvotesCountToIssues: migrated (0.0042s) =================
== 20210701111909 BackfillIssuesUpvotesCount: migrating =======================
-- Scheduled 1 BackfillUpvotesCountOnIssues jobs with a maximum of 10000 records per batch and an interval of 120 seconds.
The migration is expected to take at least 120 seconds. Expect all jobs to have completed after 2021-07-06 09:22:39 UTC."
== 20210701111909 BackfillIssuesUpvotesCount: migrated (0.0142s) ==============
❯ bin/rake db:rollback
== 20210701111909 BackfillIssuesUpvotesCount: reverting =======================
== 20210701111909 BackfillIssuesUpvotesCount: reverted (0.0000s) ==============
❯ bin/rake db:rollback
== 20210701111627 AddUpvotesCountToIssues: reverting ==========================
-- remove_column(:issues, :upvotes_count)
-> 0.0015s
== 20210701111627 AddUpvotesCountToIssues: reverted (0.0016s) =================
Background Migration Details
702585 items to update
batch size = 10000
702585 / 10000 = 70 batches
Estimated times per batch:
- 131ms for update statement with 1000 items
Total: ~1310ms per batch
2 min delay per batch (safe for the given total time per batch)
70 batches * 2 min per batch = 2 hours to run all the scheduled jobs
Execution plan (batch query)
https://gitlab.slack.com/archives/CLJMDRD8C/p1625648578224200
SELECT "issues"."id" FROM "issues" INNER JOIN award_emoji e ON e.awardable_id = issues.id AND e.awardable_type = 'Issue' AND e.name = 'thumbsup' WHERE "issues"."id" >= 506 ORDER BY "issues"."id" ASC LIMIT 1 OFFSET 5000;
Limit (cost=5374.72..5375.59 rows=1 width=4)
-> Gather Merge (cost=1001.15..553433.07 rows=631557 width=4)
Workers Planned: 2
-> Nested Loop (cost=1.12..479535.70 rows=263149 width=4)
-> Parallel Index Scan using index_award_emoji_on_awardable_type_and_awardable_id on award_emoji e (cost=0.56..119530.07 rows=263150 width=4)
Index Cond: ((awardable_type)::text = 'Issue'::text)
Filter: ((name)::text = 'thumbsup'::text)
-> Index Only Scan using issues_pkey on issues (cost=0.57..1.37 rows=1 width=4)
Index Cond: ((id = e.awardable_id) AND (id >= 506))
Execution plan (update query)
https://gitlab.slack.com/archives/CLJMDRD8C/p1625568347156000
EXPLAIN UPDATE issues
SET upvotes_count = sub_q.count_all
FROM (
SELECT COUNT(*) AS count_all, e.awardable_id AS issue_id
FROM award_emoji AS e
WHERE e.name = 'thumbsup' AND
e.awardable_type = 'Issue' AND
e.awardable_id BETWEEN 7620223 AND 7621223
GROUP BY issue_id
) AS sub_q
WHERE sub_q.issue_id = issues.id;
Update on issues (cost=1.12..18.11 rows=3 width=1384)
-> Nested Loop (cost=1.12..18.11 rows=3 width=1384)
-> Subquery Scan on sub_q (cost=0.56..7.35 rows=3 width=48)
-> GroupAggregate (cost=0.56..7.32 rows=3 width=12)
Group Key: e.awardable_id
-> Index Scan using index_award_emoji_on_awardable_type_and_awardable_id on award_emoji e (cost=0.56..7.28 rows=3 width=4)
Index Cond: (((awardable_type)::text = 'Issue'::text) AND (awardable_id >= 7620223) AND (awardable_id <= 7621223))
Filter: ((name)::text = 'thumbsup'::text)
-> Index Scan using issues_pkey on issues (cost=0.57..3.58 rows=1 width=1316)
Index Cond: (id = sub_q.issue_id)
Maximum number of upvotes
Even thought we only have sub 1000 upvotes maximum, I think that using smallint
might be too risky since it leaves much less headroom.
click here
gitlabhq_production=> SELECT COUNT(*) AS count_all, "award_emoji"."awardable_type" AS award_emoji_awardable_type, "award_emoji"."awardable_id" AS award_emoji_awardable_id FROM "award_emoji" WHERE "award_emoji"."name" = 'thumbsup' GROUP BY "award_emoji"."awardable_type", "award_emoji"."awardable_id" ORDER BY count_all DESC LIMIT 10;
count_all | award_emoji_awardable_type | award_emoji_awardable_id
-----------+----------------------------+--------------------------
722 | Issue | 24330117
692 | Issue | 23360255
613 | Issue | 2447341
609 | Issue | 835618
565 | Issue | 3338061
565 | Issue | 18164114
536 | Issue | 24660406
531 | Issue | 3579761
493 | Note | 11958714
489 | Issue | 24652824
(10 rows)
Screenshots (strongly suggested)
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) - [-] I have tested this MR in all supported browsers, or it's not needed.
- [-] I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Security
Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team
Related to #263365
Edited by Dmitry Gruzd