Searching projects on the "Explore" page is extremely slow

Example URL:

https://gitlab.com/explore/projects?sort=latest_activity_desc%3Futf8%3D%E2%9C%93&sort=latest_activity_desc&name=c

This page will end up running the following query:

SELECT projects.*
FROM projects
WHERE (
    EXISTS (
        SELECT 1
        FROM project_authorizations
        WHERE project_authorizations.user_id = 209240
        AND (project_authorizations.project_id = projects.id)
    )
    OR projects.visibility_level IN (10,20)
)
AND (
    projects.id IN (
        SELECT projects.id
        FROM projects
        WHERE (
            (projects.path ILIKE '%c%' OR projects.name ILIKE '%c%')
            OR projects.description ILIKE '%c%'
        )

        UNION

        SELECT projects.id
        FROM projects
        INNER JOIN namespaces ON namespaces.id = projects.namespace_id
        WHERE (namespaces.name ILIKE '%c%')
    )
)
AND projects.archived = 'f'
ORDER BY projects.last_activity_at DESC
LIMIT 21
OFFSET 0

This query produces the following plan:

Limit  (cost=1067561.38..3608518.20 rows=21 width=627) (actual time=19371.638..47768.760 rows=21 loops=1)
  Buffers: shared hit=361344, temp read=93191 written=17997
  ->  Nested Loop Semi Join  (cost=1067561.38..119445881907.62 rows=987164 width=627) (actual time=19371.636..47768.722 rows=21 loops=1)
        Join Filter: (projects.id = projects_1.id)
        Rows Removed by Join Filter: 48323682
        Buffers: shared hit=361344, temp read=93191 written=17997
        ->  Index Scan Backward using index_projects_on_last_activity_at on projects  (cost=0.56..16908332.37 rows=1974329 width=627) (actual time=0.540..3.044 rows=27 loops=1)
              Filter: ((NOT archived) AND ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (visibility_level = ANY ('{10,20}'::integer[]))))
              Rows Removed by Filter: 402
              Buffers: shared hit=698
              SubPlan 1
                ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations  (cost=0.43..4.45 rows=1 width=0) (never executed)
                      Index Cond: ((user_id = 209240) AND (project_id = projects.id))
                      Heap Fetches: 0
              SubPlan 2
                ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations project_authorizations_1  (cost=0.43..12.97 rows=82 width=4) (actual time=0.01..0.211 rows=353 loops=1)
                      Index Cond: (user_id = 209240)
                      Heap Fetches: 19
                      Buffers: shared hit=36
        ->  Materialize  (cost=1067560.82..1161577.64 rows=3932691 width=4) (actual time=608.198..1250.826 rows=1789767 loops=27)
              Buffers: shared hit=360646, temp read=93191 written=17997
              ->  Unique  (cost=1067560.82..1087224.28 rows=3932691 width=4) (actual time=16421.216..19219.614 rows=2169352 loops=1)
                    Buffers: shared hit=360646, temp read=14294 written=14288
                    ->  Sort  (cost=1067560.82..1077392.55 rows=3932691 width=4) (actual time=16421.215..17892.378 rows=2724283 loops=1)
                          Sort Key: projects_1.id
                          Sort Method: external merge  Disk: 37192kB
                          Buffers: shared hit=360646, temp read=14294 written=14288
                          ->  Append  (cost=0.00..598384.34 rows=3932691 width=4) (actual time=0.024..14542.097 rows=2724283 loops=1)
                                Buffers: shared hit=360646, temp read=9632 written=9626
                                ->  Seq Scan on projects projects_1  (cost=0.00..199493.37 rows=2662934 width=4) (actual time=0.022..6938.805 rows=1764660 loops=1)
                                      Filter: (((path)::text ~~* '%c%'::text) OR ((name)::text ~~* '%c%'::text) OR (description ~~* '%c%'::text))
                                      Rows Removed by Filter: 1603573
                                      Buffers: shared hit=136624
                                ->  Hash Join  (cost=129591.76..359564.05 rows=1269757 width=4) (actual time=2035.885..6270.024 rows=959623 loops=1)
                                      Hash Cond: (projects_2.namespace_id = namespaces.id)
                                      Buffers: shared hit=224022, temp read=9632 written=9626
                                      ->  Seq Scan on projects projects_2  (cost=0.00..172548.07 rows=3592707 width=8) (actual time=0.029..1869.447 rows=3368233 loops=1)
                                            Buffers: shared hit=136621
                                      ->  Hash  (cost=116221.75..116221.75 rows=814881 width=4) (actual time=2033.982..2033.982 rows=522052 loops=1)
                                            Buckets: 524288  Batches: 4  Memory Usage: 8692kB
                                            Buffers: shared hit=87401, temp written=1146
                                            ->  Seq Scan on namespaces  (cost=0.00..116221.75 rows=814881 width=4) (actual time=0.020..1809.092 rows=522052 loops=1)
                                                  Filter: ((name)::text ~~* '%c%'::text)
                                                  Rows Removed by Filter: 1481873
                                                  Buffers: shared hit=87401

 Planning time: 2.519 ms
 Execution time: 47786.125 ms

The plan is visualised here: https://explain.depesz.com/s/lnMF

This query sometimes times out after 30 seconds, sometimes it doesn't but takes 16 seconds to run. This query is really bad.

I experimented a bit with using a UNION instead of sub-queries, but this didn't make much of a difference.

We can massively speed up the query by just getting rid of the UNION for getting namespaces that match the query, meaning we run this instead (after also getting rid of the stupid subquery):

SELECT projects.*
FROM projects
WHERE (
    EXISTS (
        SELECT 1
        FROM project_authorizations
        WHERE project_authorizations.user_id = 209240
        AND (project_authorizations.project_id = projects.id)
    )
    OR projects.visibility_level IN (10,20)
)
AND (
    (projects.path ILIKE '%c%' OR projects.name ILIKE '%c%')
    OR projects.description ILIKE '%c%'
)
AND projects.archived = 'f'
ORDER BY projects.last_activity_at DESC
LIMIT 21
OFFSET 0;

With this change the query only takes 4-5 milliseconds to run.

@DouweM I assigned this to the backlog, but could you move this to the appropriate milestone?

Edited Nov 24, 2017 by Yorick Peterse