Skip to content

Add issues.upvotes_count column

Dmitry Gruzd requested to merge 263365-add-upvotes-column-to-issues into master

What does this MR do?

This MR introduces and backfills a new column issues.upvotes_count. This column is required to efficiently sort issues search results by popularity (upvotes).

#263365

Migration output

click here
❯ bin/rake db:migrate
== 20210701111627 AddUpvotesCountToIssues: migrating ==========================
-- add_column(:issues, :upvotes_count, :integer, {:default=>0, :null=>false})
   -> 0.0010s
== 20210701111627 AddUpvotesCountToIssues: migrated (0.0042s) =================

== 20210701111909 BackfillIssuesUpvotesCount: migrating =======================
-- Scheduled 1 BackfillUpvotesCountOnIssues jobs with a maximum of 10000 records per batch and an interval of 120 seconds.

The migration is expected to take at least 120 seconds. Expect all jobs to have completed after 2021-07-06 09:22:39 UTC."
== 20210701111909 BackfillIssuesUpvotesCount: migrated (0.0142s) ==============

❯ bin/rake db:rollback
== 20210701111909 BackfillIssuesUpvotesCount: reverting =======================
== 20210701111909 BackfillIssuesUpvotesCount: reverted (0.0000s) ==============

❯ bin/rake db:rollback
== 20210701111627 AddUpvotesCountToIssues: reverting ==========================
-- remove_column(:issues, :upvotes_count)
   -> 0.0015s
== 20210701111627 AddUpvotesCountToIssues: reverted (0.0016s) =================

Background Migration Details

702585 items to update
batch size = 10000
702585 / 10000 = 70 batches

Estimated times per batch:

- 131ms for update statement with 1000 items
Total: ~1310ms per batch

2 min delay per batch (safe for the given total time per batch)
70 batches * 2 min per batch = 2 hours to run all the scheduled jobs

Execution plan (batch query)

https://gitlab.slack.com/archives/CLJMDRD8C/p1625648578224200

SELECT "issues"."id" FROM "issues" INNER JOIN award_emoji e ON e.awardable_id = issues.id AND e.awardable_type = 'Issue' AND e.name = 'thumbsup' WHERE "issues"."id" >= 506 ORDER BY "issues"."id" ASC LIMIT 1 OFFSET 5000;
Limit  (cost=5374.72..5375.59 rows=1 width=4)
  ->  Gather Merge  (cost=1001.15..553433.07 rows=631557 width=4)
        Workers Planned: 2
        ->  Nested Loop  (cost=1.12..479535.70 rows=263149 width=4)
              ->  Parallel Index Scan using index_award_emoji_on_awardable_type_and_awardable_id on award_emoji e  (cost=0.56..119530.07 rows=263150 width=4)
                    Index Cond: ((awardable_type)::text = 'Issue'::text)
                    Filter: ((name)::text = 'thumbsup'::text)
              ->  Index Only Scan using issues_pkey on issues  (cost=0.57..1.37 rows=1 width=4)
                    Index Cond: ((id = e.awardable_id) AND (id >= 506))

Execution plan (update query)

https://gitlab.slack.com/archives/CLJMDRD8C/p1625568347156000

EXPLAIN UPDATE issues
SET upvotes_count = sub_q.count_all
FROM (
  SELECT COUNT(*) AS count_all, e.awardable_id AS issue_id
  FROM award_emoji AS e
  WHERE e.name = 'thumbsup' AND
  e.awardable_type = 'Issue' AND
  e.awardable_id BETWEEN 7620223 AND 7621223
  GROUP BY issue_id
) AS sub_q
WHERE sub_q.issue_id = issues.id;
Update on issues  (cost=1.12..18.11 rows=3 width=1384)
  ->  Nested Loop  (cost=1.12..18.11 rows=3 width=1384)
        ->  Subquery Scan on sub_q  (cost=0.56..7.35 rows=3 width=48)
              ->  GroupAggregate  (cost=0.56..7.32 rows=3 width=12)
                    Group Key: e.awardable_id
                    ->  Index Scan using index_award_emoji_on_awardable_type_and_awardable_id on award_emoji e  (cost=0.56..7.28 rows=3 width=4)
                          Index Cond: (((awardable_type)::text = 'Issue'::text) AND (awardable_id >= 7620223) AND (awardable_id <= 7621223))
                          Filter: ((name)::text = 'thumbsup'::text)
        ->  Index Scan using issues_pkey on issues  (cost=0.57..3.58 rows=1 width=1316)
              Index Cond: (id = sub_q.issue_id)

Maximum number of upvotes

Even thought we only have sub 1000 upvotes maximum, I think that using smallint might be too risky since it leaves much less headroom.

click here
gitlabhq_production=> SELECT COUNT(*) AS count_all, "award_emoji"."awardable_type" AS award_emoji_awardable_type, "award_emoji"."awardable_id" AS award_emoji_awardable_id FROM "award_emoji" WHERE "award_emoji"."name" = 'thumbsup' GROUP BY "award_emoji"."awardable_type", "award_emoji"."awardable_id" ORDER BY count_all DESC LIMIT 10;
 count_all | award_emoji_awardable_type | award_emoji_awardable_id
-----------+----------------------------+--------------------------
       722 | Issue                      |                 24330117
       692 | Issue                      |                 23360255
       613 | Issue                      |                  2447341
       609 | Issue                      |                   835618
       565 | Issue                      |                  3338061
       565 | Issue                      |                 18164114
       536 | Issue                      |                 24660406
       531 | Issue                      |                  3579761
       493 | Note                       |                 11958714
       489 | Issue                      |                 24652824
(10 rows)

Screenshots (strongly suggested)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team

Related to #263365

Edited by Dmitry Gruzd

Merge request reports