Improve database response time for listing user activity
What does this MR do?
This MR improves a database query to display activity of a given user to the currently logged in user. It leverages the user_interacted_tables
table introduced in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/17327.
Are there points in the code the reviewer needs to double check?
- Please double-check permissions are still respected properly.
- Also, I ditched the
LATERAL
join because https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/17454#note_61882724 -- anything I'm not seeing?
Why was this MR needed?
For some users, displaying the user activity page times out. That is, the related query runs into the statement timeout.
A URL to test this is https://gitlab.com/ather-embitel (note we fail to show something for 'most recent activity' after a while).
Grafana links: Controller performance
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary - Review
-
Has been reviewed by Backend -
Has been reviewed by Database
-
-
Conform by the merge request performance guides -
Conform by the style guides -
Squashed related commits together -
End-to-end tests pass ( package-qa
manual pipeline job)
What are the relevant issue numbers?
Closes #40525 (closed)
Merge request reports
Activity
added 127 commits
-
86dc1660...e20f96c8 - 110 commits from branch
master
- 108216f7 - Create user_contributed_projects table.
- b8d47d7f - Add post-migration to populate user_contributed_projects table.
- e347975c - Simpler migration strategy for MySQL.
- e2091b81 - Track which projects a user contributed to.
- fc791a22 - Cache project/user combinations.
- da493674 - Treat special cases accordingly.
- 34f4d045 - Address rubocop offenses.
- 076a284c - Only track contributions if table is available.
- 2db19399 - Flush cached information about schema.
- e9f0e4d6 - Add changelog.
- 58574a22 - Improve migration robustness and speed.
- ea626d60 - Rename to UserInteractedProjects.
- 3bd9cc31 - Leverage user_contributed_projects to find recent events.
- d5b057f0 - TEMP: Use PG 9.6 for testing.
- 836db001 - SAVEPOINT
- aba64821 - fixup! Leverage user_contributed_projects to find recent events.
- 45eac5d8 - WIP: #to_a necessary?
Toggle commit list-
86dc1660...e20f96c8 - 110 commits from branch
marked as a Work In Progress from aba64821
added 132 commits
-
45eac5d8...f29dbaf5 - 116 commits from branch
master
- c2f0321e - Create user_contributed_projects table.
- 6f8d1815 - Add post-migration to populate user_contributed_projects table.
- 062d51dd - Simpler migration strategy for MySQL.
- d72d9790 - Track which projects a user contributed to.
- 4a9fbfb1 - Cache project/user combinations.
- 1bba96eb - Treat special cases accordingly.
- cf3cf5cb - Address rubocop offenses.
- b31fd8c9 - Only track contributions if table is available.
- abb5cec7 - Flush cached information about schema.
- 904a7460 - Add changelog.
- 11d3266f - Improve migration robustness and speed.
- 7a28f202 - Rename to UserInteractedProjects.
- 9a9cfdf6 - For MySQL, foreign keys are dependent on index.
- c85b07a3 - Leverage user_contributed_projects to find recent events.
- da12f676 - TEMP: Use PG 9.6 for testing.
- f2c938b4 - SAVEPOINT
Toggle commit list-
45eac5d8...f29dbaf5 - 116 commits from branch
Query comparison:
Numbers from staging. All queries yield 7 events.
New query with
LATERAL
join:- Execution time: 0.35ms
- https://explain.depesz.com/s/ZQpe
SELECT events.id FROM ( SELECT "projects"."id" FROM "projects" INNER JOIN "project_authorizations" ON "project_authorizations"."project_id" = "projects"."id" INNER JOIN "user_interacted_projects" ON "projects"."id" = "user_interacted_projects"."project_id" WHERE "user_interacted_projects"."user_id" = 876946 AND "project_authorizations"."user_id" = 1675239 UNION SELECT "projects"."id" FROM "projects" INNER JOIN "user_interacted_projects" ON "projects"."id" = "user_interacted_projects"."project_id" WHERE "user_interacted_projects"."user_id" = 876946 AND "projects"."visibility_level" IN ( 10 ,20 ) ) AS projects_for_lateral INNER JOIN LATERAL(SELECT "events".* FROM "events" WHERE "events"."author_id" = 876946 AND (events.project_id = projects_for_lateral.id) ORDER BY id DESC LIMIT 20) AS events ON true INNER JOIN "projects" ON "projects"."id" = "events"."project_id" LEFT JOIN "namespaces" ON "namespaces"."id" = "projects"."namespace_id" LEFT JOIN "users" ON "users"."id" = "events"."author_id" ORDER BY "events"."id" DESC LIMIT 20
New query with normal join:
- Execution time: 0.88ms
- https://explain.depesz.com/s/5UBH
SELECT events.id FROM ( SELECT "projects"."id" FROM "projects" INNER JOIN "project_authorizations" ON "project_authorizations"."project_id" = "projects"."id" INNER JOIN "user_interacted_projects" ON "projects"."id" = "user_interacted_projects"."project_id" WHERE "user_interacted_projects"."user_id" = 876946 AND "project_authorizations"."user_id" = 1675239 UNION SELECT "projects"."id" FROM "projects" INNER JOIN "user_interacted_projects" ON "projects"."id" = "user_interacted_projects"."project_id" WHERE "user_interacted_projects"."user_id" = 876946 AND "projects"."visibility_level" IN ( 10 ,20 ) ) AS projects_for_join INNER JOIN ( SELECT "events".* FROM "events" WHERE "events"."author_id" = 876946 ) AS events ON events.project_id = projects_for_join.id INNER JOIN "projects" ON "projects"."id" = "events"."project_id" LEFT JOIN "namespaces" ON "namespaces"."id" = "projects"."namespace_id" LEFT JOIN "users" ON "users"."id" = "events"."author_id" ORDER BY "events"."id" DESC LIMIT 20
Original query with LIMIT 20:
- Execution time: statement timeout (> 1min)
- Plan with LIMIT 20: https://explain.depesz.com/s/yvfj
Original query with LIMIT 2000
Choosing a higher
LIMIT
yields a completely different (and much better!) plan.- Execution time: 468ms
- Plan with LIMIT 2000: https://explain.depesz.com/s/rAS
SELECT events.id FROM "events" LEFT JOIN "users" ON "users"."id" = "events"."author_id" LEFT JOIN "projects" ON "projects"."id" = "events"."project_id" LEFT JOIN "namespaces" ON "namespaces"."id" = "projects"."namespace_id" WHERE "events"."author_id" = 876946 AND ( projects.id IN ( SELECT projects.id FROM "projects" WHERE ( EXISTS ( SELECT 1 FROM "project_authorizations" WHERE "project_authorizations"."user_id" = 1675239 AND (project_authorizations.project_id = projects.id) ) ) UNION SELECT projects.id FROM "projects" WHERE "projects"."visibility_level" IN ( 10 ,20 ) ) ) ORDER BY "events"."id" DESC LIMIT 20
Edited by Andreas Brandlmentioned in issue #40525 (closed)
- Resolved by Andreas Brandl
The current changeset comes with a
LATERAL
join if the postgres version supports it and otherwise falls back to a normal join. In both cases the execution time is <1ms and the differences between both approaches are negligible in the example in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/17454#note_61502737.I'm thinking we might ditch the
LATERAL
approach and not branch out depending on the postgres version. However would like to see production timings first and maybe look at another example user.If we were to keep the
LATERAL
join, I'd strongly suggest to also enable PG 9.6 on CI as otherwise, we'd have untested branches of code that would run on e.g. gitlab.com or any other customer we a >= PG 9.3 setup.
added 7 commits
- e5cad049 - Operate on ids to avoid unnecessary loading.
- a6d9e22b - Remove unnecessary safe guard.
- 318805c5 - Nested transaction for find_or_create_by! queries.
- b844e57f - Improve robustness of migration.
- ca6bd5d8 - Add id primary key to table.
- d56e0098 - Leverage user_contributed_projects to find recent events.
- a1584475 - TEMP: Use PG 9.6 for testing.
Toggle commit listmentioned in merge request !17327 (merged)
added 145 commits
-
a1584475...49f72d06 - 123 commits from branch
master
- 8cca3282 - Create user_contributed_projects table.
- f579a087 - Add post-migration to populate user_contributed_projects table.
- f88d5132 - Simpler migration strategy for MySQL.
- ebcf3c73 - Track which projects a user contributed to.
- d4d0740e - Cache project/user combinations.
- 87010983 - Treat special cases accordingly.
- 8dde0301 - Address rubocop offenses.
- 375a5c9f - Only track contributions if table is available.
- 4418066e - Flush cached information about schema.
- 7a22a589 - Add changelog.
- 986f470d - Improve migration robustness and speed.
- 43b74afd - Rename to UserInteractedProjects.
- 8a7cd25d - For MySQL, foreign keys are dependent on index.
- 11a559f7 - Singularize model name.
- 9ebbcb8f - Operate on ids to avoid unnecessary loading.
- e83e85ce - Remove unnecessary safe guard.
- d270b857 - Nested transaction for find_or_create_by! queries.
- 8e668076 - Improve robustness of migration.
- 9f776707 - Shortcut and return on duplicate record error.
- 39d4dd58 - Add id primary key to table.
- 3f4d8942 - Leverage user_contributed_projects to find recent events.
- c12cc795 - TEMP: Use PG 9.6 for testing.
Toggle commit list-
a1584475...49f72d06 - 123 commits from branch
marked the checklist item Changelog entry added, if necessary as completed
marked the checklist item Conform by the style guides as completed
marked the checklist item Conform by the merge request performance guides as completed
- Resolved by Andreas Brandl
I think this is good for review along with https://gitlab.com/gitlab-org/gitlab-ce/issues/43460 (which should be reviewed/merged first).
Over to @yorickpeterse for review.
😃 assigned to @yorickpeterse
Kept https://gitlab.com/gitlab-org/gitlab-ce/commit/b119268948a8223a8d6ebe221a0ea9199a7854f0?merge_request_iid=17454 for reference. Should I squash this together? @yorickpeterse
Edited by Andreas Brandl@abrandl Prior to merging I would prefer for related commits to be squashed together, this keeps the git log a bit cleaner.
@abrandl Is it possible to run the JOIN vs JOIN LATERAL queries on production? I'm surprised JOIN LATERAL is performing slightly less well, especially since we've used it in the past to speed things up dramatically (in
EventCollection
that is). I'm wondering if perhaps the presence of buffers may influence the plans/timings here (that is, the first query caching the data so the 2nd one can reuse it).@yorickpeterse I ran them in production earlier today, see https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/17454#note_61882724 for the timings. Not sure why this goes in favor of the normal join.
Here are the plans for lateral join and normal join. Both queries yield 7 rows.
The order of those queries did not matter, so I don't think it's due to query caching.
Edited by Andreas Brandl@abrandl I was curious so I gave both queries a try (but using
SELECT events.*
instead ofSELECT events.id
). I also ran both queries on two different secondaries. In this case the following happens: the INNER JOIN query runs in about 5 milliseconds while the JOIN LATERAL one runs in about 55 milliseconds.Super weird, but the data confirms INNER JOIN is better suited here so let's go with that.
@abrandl With https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/17327 merged could you rebase this MR?
added 202 commits
-
b1192689...7e7f260d - 201 commits from branch
master
- 7d2859e9 - Leverage user_contributed_projects to find recent events.
-
b1192689...7e7f260d - 201 commits from branch
@yorickpeterse Done, also squashed the commits together
marked the checklist item Squashed related commits together as completed
@rspeicher @jameslopez This MR introduces some very nice performance improvements for user profiles (and solve timeouts). I would like to have this picked into 10.6.0 if possible.
Sign off:
- @rspeicher or @jameslopez
- @stanhu
- @DouweM (since Marin isn't around this week)
Edited by Douwe Maanmentioned in commit aec3e1dd
This MR introduces some very nice performance improvements
You know how to sell it!
I'm okay with this one. Stupid question, but has this been tested on mySQL? :)
Actually we just merged this: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/17636
**Do not** set the relevant `Pick into X.Y` label (see above) before request an exception; this should be done after the exception is approved.
So I'm removing ~"Pick into 10.6" for now
Edited by James LopezAh! Also, we should use the exception request template on https://gitlab.com/gitlab-org/release/tasks/issues/new
I'm okay with this one. Stupid question, but has this been tested on mySQL? :)
Yes, all MySQL builds passed.
@jameslopez Wait, so should I create an issue on said issue tracker or copy over something into this MR?
Edit: Oh, new issue it is.
Edited by Yorick Petersementioned in issue gitlab-org/release/tasks#118 (closed)
Request here: gitlab-org/release/tasks#118 (closed)
mentioned in merge request gitlab-com/www-gitlab-com!10163 (merged)
Picked into https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/17723, will merge into
10-6-stable
ready for10.6 RC5
mentioned in issue #44446 (closed)