Get rid of SeqScans on "users" in queries with "ghost IS NOT TRUE AND support_bot IS NOT TRUE"
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Mentioned by @splattael in https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/10353#note_157803114:
SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN ('active')) AND (ghost IS NOT TRUE AND support_bot IS NOT TRUE);
explain (buffers, analyze) SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN ('active')) AND (ghost IS NOT TRUE AND support_bot IS NOT TRUE);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=859320.57..859320.58 rows=1 width=8) (actual time=2361.483..2361.483 rows=1 loops=1)
Buffers: shared hit=201336
-> Seq Scan on users (cost=0.00..850367.68 rows=3581159 width=0) (actual time=0.012..2078.690 rows=3607935 loops=1)
Filter: ((ghost IS NOT TRUE) AND (support_bot IS NOT TRUE) AND ((state)::text = 'active'::text))
Rows Removed by Filter: 19973
Buffers: shared hit=201336
Planning time: 0.640 ms
Execution time: 2361.511 ms
(8 rows)
There is an index on users:
"index_users_on_state_and_internal_attrs" btree (state) WHERE ghost <> true AND support_bot <> true
– it indexes only records where both ghost and support_bot are FALSEs.
It might be right to convert it to another partial index, with WHERE ghost is not true AND support_bot is not true, but we need to ensure that other queries do not degrade. If we have queries with WHERE ghost <> true AND support_bot <> true, they might be logically wrong, since these columns contain NULLs, which, I suppose, are treated in the same way as FALSEs.
Edited by 🤖 GitLab Bot 🤖