Replace indexes for counting active users
What does this MR do?
Replace indexes for counting active users
This adjusts the partial condition for an index. The index is intended
to be used when counting active users with ghost IS NOT TRUE AND bot_type IS NULL
.
With the current index, this wasn't working as the partial condition
didn't match the query: ghost <> TRUE
is not semantically equivalent
to ghost IS NOT TRUE
(null semantics).
The reason we add an index particularly intended for EE is that the EE
query is going to have the additional part AND bot_type IS NULL
whereas the CE query doesn't. Logically, it'd be enough to have an index
for ghost IS NOT TRUE
. However, on GitLab.com, the query planner makes
poor choices when the additional AND bot_type IS NULL
part is present:
It goes for the index on bot_type
and doesn't use the partial index.
Note the existing index isn't being used at all according to GitLab.com index statistics. Hence we can first remove it and don't have to worry about the window of time without an index.
Relates to https://gitlab.com/gitlab-org/gitlab-ce/issues/66770
Query plans
Plans from staging:
gitlabhq_production=# explain analyze SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN ('active')) AND (ghost IS NOT TRUE) AND "users"."bot_type" IS NULL;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=101816.81..101816.82 rows=1 width=8) (actual time=937.831..937.831 rows=1 loops=1)
-> Seq Scan on users (cost=0.00..97995.74 rows=1528428 width=0) (actual time=0.015..833.454 rows=1528489 loops=1)
Filter: ((ghost IS NOT TRUE) AND (bot_type IS NULL) AND ((state)::text = 'active'::text))
Rows Removed by Filter: 4210
Planning time: 0.320 ms
Execution time: 937.865 ms
(6 rows)
Time: 939.240 ms
gitlabhq_production=# CREATE INDEX CONCURRENTLY index_users_on_state_and_internal_ee ON users (state) WHERE ghost IS NOT TRUE AND bot_type IS NULL;
CREATE INDEX
Time: 5929.095 ms
gitlabhq_production=# explain analyze SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN ('active')) AND (ghost IS NOT TRUE) AND "users"."bot_type" IS NULL;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=42872.82..42872.83 rows=1 width=8) (actual time=345.913..345.914 rows=1 loops=1)
-> Index Only Scan using index_users_on_state_and_internal_ee on users (cost=0.43..39051.75 rows=1528428 width=0) (actual time=0.061..245.163 rows=1528489 loops=1)
Index Cond: (state = 'active'::text)
Heap Fetches: 113626
Planning time: 1.654 ms
Execution time: 345.979 ms
(6 rows)
Time: 349.880 ms
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry for user-facing changes, or community contribution. Check the link for other scenarios.