Follow-up from "Add workers to notify over limit owners" Add Index for `next_over_limit_check_at`
The following discussion from !98438 (merged) should be addressed:
-
@reprazent started a discussion: (+3 comments) This is the most complicated part of the query for finding a next namespace to work on. The query from this, without the lock, is:
SELECT "namespace_details"."namespace_id" FROM "namespace_details" WHERE "namespace_details"."free_user_cap_over_limit_notified_at" IS NULL AND ( "namespace_details"."next_over_limit_check_at" <= '2023-02-09 09:55:11.773472' OR "namespace_details"."next_over_limit_check_at" IS NULL ) AND "namespace_details"."namespace_id" IN ( SELECT "namespaces"."id" FROM "namespaces" LEFT OUTER JOIN "gitlab_subscriptions" ON "gitlab_subscriptions"."namespace_id" = "namespaces"."id" LEFT OUTER JOIN "plans" ON "plans"."id" = "gitlab_subscriptions"."hosted_plan_id" WHERE "namespaces"."type" = 'Group' AND ("plans"."name" IN ('default', 'free') OR "plans"."name" IS NULL) AND "namespaces"."parent_id" IS NULL AND "namespaces"."visibility_level" != 20 ) ORDER BY "namespace_details"."next_over_limit_check_at" ASC NULLS FIRST LIMIT 1;Have we run this past a database expert? I see the databaseapproved label, but don't remember if that was before or after changing the approach.
The reason I'm asking is because we're adding the
next_over_limit_check_atcolumn in this MR, but we don't have an index for it. While we do have an order for it, which would suggest that we'd iterate the entire table for sorting it.I tried this out in postgres.ai, https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/15338/commands/53360.
Time: 12.744 min - planning: 4.883 ms - execution: 12.744 min - I/O read: 14.960 min - I/O write: 7.390 ms Shared buffers: - hits: 21290121 (~162.40 GiB) from the buffer pool - reads: 922735 (~7.00 GiB) from the OS file cache, including disk I/O - dirtied: 73121 (~571.30 MiB) - writes: 12 (~96.00 KiB)It mentions a SeqScan, but that is for the tiny plans table, so I don't think that's going to be a big problem. I wonder if we should add an index for
namespace_details.next_over_limit_check_atto the column that includes aNULLS FIRST(https://www.postgresql.org/docs/current/indexes-ordering.html).I think if the order could use an index scan, we'd find the first matching record pretty quickly.
I tried this out by creating this index:
CREATE INDEX next_over_limit_check_index ON namespace_details (next_over_limit_check_at ASC NULLS FIRST);The timings looked way better: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/15338/commands/53366
Time: 11.088 ms - planning: 7.445 ms - execution: 3.643 ms - I/O read: 3.212 ms - I/O write: 0.000 ms Shared buffers: - hits: 7 (~56.00 KiB) from the buffer pool - reads: 14 (~112.00 KiB) from the OS file cache, including disk I/O - dirtied: 0 - writes: 0@a_akgun As you originally approved this MR for database. Would you mind taking a look at this?