Improve query performance by adding new index in zoekt_enabled_namespaces

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

The following discussion from !167336 (merged) should be addressed:

  • @DylanGriffith started a discussion: (+1 comment)

    @rkumar555 I can't help but notice this query performance is not great. From your query plan it uses 70MiB of buffers every time it runs. It's not making use of any index.

    I'm sure I've reviewed this before but based on the query plan https://console.postgres.ai/shared/89708c93-da35-4da9-b258-0e79d59e7ca8 I notice that the inner most node of the plan is:

                                     ->  Index Scan using zoekt_enabled_namespaces_pkey on public.zoekt_enabled_namespaces zoekt_enabled_namespaces_1  (cost=0.29..691.14 rows=985 width=8) (actual time=17.093..133.176 rows=501 loops=1)
                                           Filter: ((NOT zoekt_enabled_namespaces_1.search) AND (zoekt_enabled_namespaces_1.created_at <= '2024-09-26 13:36:36.220036+00'::timestamp with time zone))
                                           Rows Removed by Filter: 4265
                                           Buffers: shared hit=4513 read=193 dirtied=3
                                           I/O Timings: read=126.815 write=0.000

    This is basically saying that it is scanning the entire zoekt_enabled_namespaces initially to find the first 500 where search = false and created_at <= ?.

    You can see from Rows Removed by Filter: 4265 that we have to read and skip a lot of rows.

    I believe the performance if this is likely to get worse over time. It's actually made worse by our order by id clause because we're basically insisting that Postgres start from the beginning of the table and we're actually pretty sure the rows we're looking for are at the end of the table (because the old ones will already have been enabled).

    The change in this MR is not making things worse from the original query plan so I don't mind merging this but we still want to address this.

    My thinking is that we can do a lot better if we index id where search = false. This will mean that it can use an index which only contains the rows we care about (those with search = false). It's still not going to make our joins efficient but if we assume that the set of rows where search = false is small then I think this keeps an upper bound on performance.

    I tried testing this suggestion in postgres.ai but for some reason it seems like we have 1476 rows in zoekt_enabled_namespaces where search = false so it actually doesn't help. This was surprising to me as I'd assume we are quickly marking them search=true as they catch up with indexing.

Edited by 🤖 GitLab Bot 🤖