Get rid of SeqScans on "users" in queries with "ghost IS NOT TRUE AND support_bot IS NOT TRUE"

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Close this issue

Mentioned by @splattael in https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/10353#note_157803114:

SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN ('active')) AND (ghost IS NOT TRUE AND support_bot IS NOT TRUE);

explain (buffers, analyze) SELECT COUNT(*) FROM "users" WHERE ("users"."state" IN ('active')) AND (ghost IS NOT TRUE AND support_bot IS NOT TRUE);

                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=859320.57..859320.58 rows=1 width=8) (actual time=2361.483..2361.483 rows=1 loops=1)
   Buffers: shared hit=201336
   ->  Seq Scan on users  (cost=0.00..850367.68 rows=3581159 width=0) (actual time=0.012..2078.690 rows=3607935 loops=1)
         Filter: ((ghost IS NOT TRUE) AND (support_bot IS NOT TRUE) AND ((state)::text = 'active'::text))
         Rows Removed by Filter: 19973
         Buffers: shared hit=201336
 Planning time: 0.640 ms
 Execution time: 2361.511 ms
(8 rows)

There is an index on users:

    "index_users_on_state_and_internal_attrs" btree (state) WHERE ghost <> true AND support_bot <> true

– it indexes only records where both ghost and support_bot are FALSEs.

It might be right to convert it to another partial index, with WHERE ghost is not true AND support_bot is not true, but we need to ensure that other queries do not degrade. If we have queries with WHERE ghost <> true AND support_bot <> true, they might be logically wrong, since these columns contain NULLs, which, I suppose, are treated in the same way as FALSEs.

Edited Sep 26, 2025 by 🤖 GitLab Bot 🤖