Project 'gitlab-org/gitlab-ce' was moved to 'gitlab-org/gitlab-foss'. Please update any links and bookmarks that may still have the old path.
Use a separate table for storing push events
-
Review changes -
-
Download -
Patches
-
Plain diff
This MR adds a table called push_event_payloads
and migrates push events data to this new table.
Events Checklist
-
Index (project_id, created_at)
, then remove the standalone index onpush_events.project_id
-
Add a background migration to migrate a single row to the new format -
Add a post-deployment migration that schedules migrations for all existing push events -
Adjust the code so that creating push events uses PushEventPayload
-
Adjust Atom feeds to show push events using the new format, and make Event
compatible with this so we can show old data at the same time -
Adjust the UI to show push events using the new format -
Adjust the event queries to use a LATERAL JOIN on PostgreSQL to more efficiently get the data -
RemoveEvent#commits
/PushEvent#commits
once it's no longer in use (after taking care of the above)-
Event#commits
has to stay until all existing events have been processed
-
-
Check if we can simply drop events.data
in the next release since it seems to only be used by push events. This would remove the need for updating events, thus reducing table bloat (and subsequent DB load)- Confirmed as of July 25th, 2017:
events.data
is only used by push events
- Confirmed as of July 25th, 2017:
-
Verify the API output for push events
Database Checklist
When adding migrations:
-
Updated db/schema.rb
-
Added a down
method so the migration can be reverted -
Added the output of the migration(s) to the MR body -
Added the execution time of the migration(s) to the MR body -
Added tests for the migration in spec/migrations
if necessary (e.g. when migrating data) -
Made sure the migration won't interfere with a running GitLab cluster, for example by disabling transactions for long running migrations
When adding or modifying queries:
-
Included the raw SQL queries of the relevant queries -
Included the output of EXPLAIN ANALYZE
and execution timings of the relevant queries -
Added tests for the relevant changes
When adding tables:
-
Ordered columns based on their type sizes in descending order -
Added foreign keys if necessary -
Added indexes if necessary
General Checklist
-
Changelog entry added, if necessary -
Documentation created/updated -
Tests added for this feature/bug -
Conform by the merge request performance guides -
Conform by the style guides -
Squashed related commits together
Migration Output
From my personal staging environment, which uses the same setup as production:
== 20170608152747 PrepareEventsTableForPushEventsMigration: migrating =========
-- create_table(:events_for_migration)
-> 0.2035s
-- transaction_open?()
-> 0.0000s
-- execute("SET statement_timeout TO 0")
-> 0.0007s
-- execute("ALTER TABLE events_for_migration\nADD CONSTRAINT fk_edfd187b6f\nFOREIGN KEY (author_id)\nREFERENCES users (id)\nON DELETE cascade\nNOT VALID;\n")
-> 0.0089s
-- execute("ALTER TABLE events_for_migration VALIDATE CONSTRAINT fk_edfd187b6f;")
-> 0.0037s
== 20170608152747 PrepareEventsTableForPushEventsMigration: migrated (0.2174s
== 20170608152748 CreatePushEventPayloadsTables: migrating ====================
-- create_table(:push_event_payloads, {:id=>false})
-> 0.0227s
-- transaction_open?()
-> 0.0000s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- execute("ALTER TABLE push_event_payloads\nADD CONSTRAINT fk_36c74129da\nFOREIGN KEY (event_id)\nREFERENCES events_for_migration (id)\nON DELETE cascade\nNOT VALID;\n")
-> 0.0031s
-- execute("ALTER TABLE push_event_payloads VALIDATE CONSTRAINT fk_36c74129da;")
-> 0.0032s
== 20170608152748 CreatePushEventPayloadsTables: migrated (0.0298s) ===========
== 20170627101016 ScheduleEventMigrations: migrating ==========================
== 20170627101016 ScheduleEventMigrations: migrated (278.5996s) ===============
== 20170727123534 AddIndexOnEventsProjectIdId: migrating ======================
-- index_exists?(:events, [:project_id, :id])
-> 0.0068s
-- transaction_open?()
-> 0.0000s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- add_index(:events, [:project_id, :id], {:algorithm=>:concurrently})
-> 589.0104s
-- index_exists?(:events, :project_id)
-> 0.0077s
-- transaction_open?()
-> 0.0000s
-- select_one("SELECT current_setting('server_version_num') AS v")
-> 0.0010s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- remove_index(:events, {:algorithm=>:concurrently, :column=>:project_id})
-> 0.2620s
-- index_exists?(:events_for_migration, [:project_id, :id])
-> 0.0047s
-- transaction_open?()
-> 0.0000s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- add_index(:events_for_migration, [:project_id, :id], {:algorithm=>:concurrently})
-> 0.0164s
-- index_exists?(:events_for_migration, :project_id)
-> 0.0054s
-- transaction_open?()
-> 0.0000s
-- select_one("SELECT current_setting('server_version_num') AS v")
-> 0.0005s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- remove_index(:events_for_migration, {:algorithm=>:concurrently, :column=>:project_id})
-> 0.1220s
== 20170727123534 AddIndexOnEventsProjectIdId: migrated (589.4401s) ===========
Migration Timings
Migration | Duration |
---|---|
PrepareEventsTableForPushEventsMigration | 0.2 sec |
CreatePushEventPayloadsTables | 0.2 sec |
ScheduleEventMigrations | 278.5 seconds (4.6 minutes) |
AddIndexOnEventsProjectIdId | 589.4 sec (9.8 minutes) |
Query Plans
This query is used to get events using a JOIN LATERAL
when supported (PostgreSQL 9.3 or newer):
SELECT "events".*
FROM (
SELECT "projects"."id"
FROM "projects"
INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id"
WHERE "project_authorizations"."user_id" = 1
) projects_for_lateral
JOIN LATERAL (
SELECT "events".*
FROM "events"
WHERE (events.project_id = projects_for_lateral.id)
ORDER BY "events"."id" DESC
LIMIT 20
) AS events ON true
ORDER BY "events"."id" DESC
LIMIT 20
OFFSET 0;
The plan for this query is:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2825.33..2825.38 rows=20 width=1875) (actual time=12.069..12.079 rows=20 loops=1)
-> Sort (cost=2825.33..2828.93 rows=1440 width=1875) (actual time=12.068..12.074 rows=20 loops=1)
Sort Key: events.id DESC
Sort Method: top-N heapsort Memory: 36kB
-> Nested Loop (cost=1.43..2787.02 rows=1440 width=1875) (actual time=0.041..10.669 rows=3631 loops=1)
-> Nested Loop (cost=0.86..186.63 rows=72 width=4) (actual time=0.028..1.677 rows=299 loops=1)
-> Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations (cost=0.43..5.69 rows=72 width=4) (actual time=0.020..0.113 rows=299 loops=1)
Index Cond: (user_id = 1)
Heap Fetches: 15
-> Index Only Scan using projects_pkey on projects (cost=0.43..2.50 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=299)
Index Cond: (id = project_authorizations.project_id)
Heap Fetches: 3
-> Limit (cost=0.57..35.72 rows=20 width=1875) (actual time=0.008..0.025 rows=12 loops=299)
-> Index Scan Backward using index_events_on_project_id_and_id on events (cost=0.57..2165.74 rows=1232 width=1875) (actual time=0.007..0.022 rows=12 loops=299)
Index Cond: (project_id = projects.id)
Planning time: 0.436 ms
Execution time: 12.122 ms
This is much better compared to the fallback query used for MySQL / PostgreSQL 9.2 (and is basically what we currently use):
SELECT "events".*
FROM "events"
WHERE (
EXISTS (
SELECT 1
FROM "projects"
INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id"
WHERE "project_authorizations"."user_id" = 1
AND (projects.id = events.project_id)
)
)
ORDER BY "events"."id" DESC
LIMIT 20
OFFSET 0
This produces plan:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5455.33..5455.38 rows=20 width=1875) (actual time=1587.420..1587.433 rows=20 loops=1)
-> Sort (cost=5455.33..5677.15 rows=88726 width=1875) (actual time=1587.419..1587.426 rows=20 loops=1)
Sort Key: events.id DESC
Sort Method: top-N heapsort Memory: 36kB
-> Nested Loop (cost=187.38..3094.37 rows=88726 width=1875) (actual time=1.541..1403.954 rows=583589 loops=1)
-> HashAggregate (cost=186.81..187.53 rows=72 width=8) (actual time=1.518..1.678 rows=299 loops=1)
Group Key: projects.id
-> Nested Loop (cost=0.86..186.63 rows=72 width=8) (actual time=0.024..1.433 rows=299 loops=1)
-> Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations (cost=0.43..5.69 rows=72 width=4) (actual time=0.017..0.101 rows=299 loops=1)
Index Cond: (user_id = 1)
Heap Fetches: 15
-> Index Only Scan using projects_pkey on projects (cost=0.43..2.50 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=299)
Index Cond: (id = project_authorizations.project_id)
Heap Fetches: 3
-> Index Scan using index_events_on_project_id_and_id on events (cost=0.57..28.05 rows=1232 width=1875) (actual time=0.011..3.901 rows=1952 loops=299)
Index Cond: (project_id = projects.id)
Planning time: 0.664 ms
Execution time: 1587.500 ms
So by using JOIN LATERAL we can query the data 66x faster.
Edited by Yorick Peterse
Merge request reports
Compare and
Show latest version
- version 492e776f2c
- version 48bee8ae17
- version 4756dffd4c
- version 46e1162aa6
- version 4595a23b10
- version 44109c71f1
- version 43dfd7e2b4
- version 42e05338e1
- version 417dded9d4
- version 408a4e7102
- version 39a067992c
- version 3854d46aa2
- version 3754d46aa2
- version 365278a7de
- version 35788d07ef
- version 3489b6f72d
- version 335d742aa5
- version 320c8938cb
- version 31da73ce20
- version 30069b3db5
- version 29cc173367
- version 288da116c1
- version 27a6a15bfb
- version 2615ecf591
- version 257dfbd229
- version 242760ce75
- version 23ded0c696
- version 22390b1068
- version 213303f8e1
- version 203303f8e1
- version 193303f8e1
- version 18449b3d24
- version 1714f3fc7c
- version 1613b70b26
- version 153cdce97e
- version 140fbf8273
- version 138c62b118
- version 121cf19a01
- version 11858ee420
- version 10f9adcadf
- version 96fcf2431
- version 86fcf2431
- version 7393039e4
- version 6393039e4
- version 58560f19b
- version 42bda45eb
- version 320a60e85
- version 276133960
- version 1dc1b5c95
- master (base)
- latest versionaac1de462 commits,
- version 492e776f2c2 commits,
- version 48bee8ae172 commits,
- version 4756dffd4c2 commits,
- version 46e1162aa62 commits,
- version 4595a23b102 commits,
- version 44109c71f12 commits,
- version 43dfd7e2b42 commits,
- version 42e05338e12 commits,
- version 417dded9d42 commits,
- version 408a4e71022 commits,
- version 39a067992c2 commits,
- version 3854d46aa23 commits,
- version 3754d46aa23 commits,
- version 365278a7de2 commits,
- version 35788d07ef2 commits,
- version 3489b6f72d2 commits,
- version 335d742aa52 commits,
- version 320c8938cb2 commits,
- version 31da73ce202 commits,
- version 30069b3db52 commits,
- version 29cc1733672 commits,
- version 288da116c11 commit,
- version 27a6a15bfb1 commit,
- version 2615ecf5911 commit,
- version 257dfbd2291 commit,
- version 242760ce751 commit,
- version 23ded0c6961 commit,
- version 22390b10681 commit,
- version 213303f8e11 commit,
- version 203303f8e11 commit,
- version 193303f8e11 commit,
- version 18449b3d241 commit,
- version 1714f3fc7c1 commit,
- version 1613b70b261 commit,
- version 153cdce97e1 commit,
- version 140fbf82731 commit,
- version 138c62b1181 commit,
- version 121cf19a011 commit,
- version 11858ee4201 commit,
- version 10f9adcadf1 commit,
- version 96fcf24311 commit,
- version 86fcf24311 commit,
- version 7393039e41 commit,
- version 6393039e41 commit,
- version 58560f19b1 commit,
- version 42bda45eb1 commit,
- version 320a60e851 commit,
- version 2761339601 commit,
- version 1dc1b5c951 commit,
4 files
+ 28
− 28
Compare changes
- Side-by-side
- Inline
Files
4+ 4
− 2
@@ -66,8 +66,10 @@ class Event < ActiveRecord::Base
Loading