Add issues.upvotes_count column (!65250) · Merge requests · GitLab.org / GitLab

Dmitry Gruzd requested to merge 263365-add-upvotes-column-to-issues into master Jul 01, 2021

What does this MR do?

This MR introduces and backfills a new column issues.upvotes_count. This column is required to efficiently sort issues search results by popularity (upvotes).

#263365

Migration output

click here

❯ bin/rake db:migrate
== 20210701111627 AddUpvotesCountToIssues: migrating ==========================
-- add_column(:issues, :upvotes_count, :integer, {:default=>0, :null=>false})
   -> 0.0010s
== 20210701111627 AddUpvotesCountToIssues: migrated (0.0042s) =================

== 20210701111909 BackfillIssuesUpvotesCount: migrating =======================
-- Scheduled 1 BackfillUpvotesCountOnIssues jobs with a maximum of 10000 records per batch and an interval of 120 seconds.

The migration is expected to take at least 120 seconds. Expect all jobs to have completed after 2021-07-06 09:22:39 UTC."
== 20210701111909 BackfillIssuesUpvotesCount: migrated (0.0142s) ==============

❯ bin/rake db:rollback
== 20210701111909 BackfillIssuesUpvotesCount: reverting =======================
== 20210701111909 BackfillIssuesUpvotesCount: reverted (0.0000s) ==============

❯ bin/rake db:rollback
== 20210701111627 AddUpvotesCountToIssues: reverting ==========================
-- remove_column(:issues, :upvotes_count)
   -> 0.0015s
== 20210701111627 AddUpvotesCountToIssues: reverted (0.0016s) =================

Background Migration Details

702585 items to update
batch size = 10000
702585 / 10000 = 70 batches

Estimated times per batch:

- 131ms for update statement with 1000 items
Total: ~1310ms per batch

2 min delay per batch (safe for the given total time per batch)
70 batches * 2 min per batch = 2 hours to run all the scheduled jobs

Execution plan (batch query)

https://gitlab.slack.com/archives/CLJMDRD8C/p1625648578224200

SELECT "issues"."id" FROM "issues" INNER JOIN award_emoji e ON e.awardable_id = issues.id AND e.awardable_type = 'Issue' AND e.name = 'thumbsup' WHERE "issues"."id" >= 506 ORDER BY "issues"."id" ASC LIMIT 1 OFFSET 5000;

Limit  (cost=5374.72..5375.59 rows=1 width=4)
  ->  Gather Merge  (cost=1001.15..553433.07 rows=631557 width=4)
        Workers Planned: 2
        ->  Nested Loop  (cost=1.12..479535.70 rows=263149 width=4)
              ->  Parallel Index Scan using index_award_emoji_on_awardable_type_and_awardable_id on award_emoji e  (cost=0.56..119530.07 rows=263150 width=4)
                    Index Cond: ((awardable_type)::text = 'Issue'::text)
                    Filter: ((name)::text = 'thumbsup'::text)
              ->  Index Only Scan using issues_pkey on issues  (cost=0.57..1.37 rows=1 width=4)
                    Index Cond: ((id = e.awardable_id) AND (id >= 506))

Execution plan (update query)

https://gitlab.slack.com/archives/CLJMDRD8C/p1625568347156000

EXPLAIN UPDATE issues
SET upvotes_count = sub_q.count_all
FROM (
  SELECT COUNT(*) AS count_all, e.awardable_id AS issue_id
  FROM award_emoji AS e
  WHERE e.name = 'thumbsup' AND
  e.awardable_type = 'Issue' AND
  e.awardable_id BETWEEN 7620223 AND 7621223
  GROUP BY issue_id
) AS sub_q
WHERE sub_q.issue_id = issues.id;

Update on issues  (cost=1.12..18.11 rows=3 width=1384)
  ->  Nested Loop  (cost=1.12..18.11 rows=3 width=1384)
        ->  Subquery Scan on sub_q  (cost=0.56..7.35 rows=3 width=48)
              ->  GroupAggregate  (cost=0.56..7.32 rows=3 width=12)
                    Group Key: e.awardable_id
                    ->  Index Scan using index_award_emoji_on_awardable_type_and_awardable_id on award_emoji e  (cost=0.56..7.28 rows=3 width=4)
                          Index Cond: (((awardable_type)::text = 'Issue'::text) AND (awardable_id >= 7620223) AND (awardable_id <= 7621223))
                          Filter: ((name)::text = 'thumbsup'::text)
        ->  Index Scan using issues_pkey on issues  (cost=0.57..3.58 rows=1 width=1316)
              Index Cond: (id = sub_q.issue_id)

Maximum number of upvotes

Even thought we only have sub 1000 upvotes maximum, I think that using smallint might be too risky since it leaves much less headroom.

click here

gitlabhq_production=> SELECT COUNT(*) AS count_all, "award_emoji"."awardable_type" AS award_emoji_awardable_type, "award_emoji"."awardable_id" AS award_emoji_awardable_id FROM "award_emoji" WHERE "award_emoji"."name" = 'thumbsup' GROUP BY "award_emoji"."awardable_type", "award_emoji"."awardable_id" ORDER BY count_all DESC LIMIT 10;
 count_all | award_emoji_awardable_type | award_emoji_awardable_id
-----------+----------------------------+--------------------------
       722 | Issue                      |                 24330117
       692 | Issue                      |                 23360255
       613 | Issue                      |                  2447341
       609 | Issue                      |                   835618
       565 | Issue                      |                  3338061
       565 | Issue                      |                 18164114
       536 | Issue                      |                 24660406
       531 | Issue                      |                  3579761
       493 | Note                       |                 11958714
       489 | Issue                      |                 24652824
(10 rows)

Screenshots (strongly suggested)

Does this MR meet the acceptance criteria?

Conformity

I have included changelog trailers, or none are needed. (Does this MR need a changelog?)
I have added/updated documentation, or it's not needed. (Is documentation required?)
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?)
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?)
I have self-reviewed this MR per code review guidelines.
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines)
I have followed the style guides.
This change is backwards compatible across updates, or this does not apply.

Availability and Testing

I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.)
[-] I have tested this MR in all supported browsers, or it's not needed.
[-] I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

[-] Label as security and @ mention @gitlab-com/gl-security/appsec
[-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
[-] Security reports checked/validated by a reviewer from the AppSec team

Related to #263365

Edited Jul 12, 2021 by Dmitry Gruzd

Add issues.upvotes_count column