Overly complex SQL doing large hash join spills to disk can probably be optimized to use indexes

This SQL seems overly complex and in particular it's doing a large hash join that spills to disk. At a quick glance that large hash join seems to be a small part of the query and can probably be optimized using indexes to not be accessing such a large set of records.

SELECT "members".*
FROM "members"
LEFT JOIN "users" ON "members"."user_id" = "users"."id"
WHERE "members"."type" IN ('GroupMember')
  AND "members"."source_type" = 'Namespace'
  AND (
    members.id IN (
      SELECT "members"."id"
      FROM "members"
      WHERE "members"."source_type" = 'Namespace'
        AND "members"."type" IN ('GroupMember')
        AND "members"."source_id" = 2763806
        AND "members"."source_type" = 'Namespace'
        AND "members"."type" IN ('GroupMember')
        AND "members"."requested_at" IS NULL
      )
    OR members.id IN (
      SELECT "members"."id"
      FROM "members"
      WHERE "members"."type" IN ('GroupMember')
        AND "members"."source_type" = 'Namespace'
        AND "members"."requested_at" IS NULL
        AND "members"."source_id" IN (
          WITH RECURSIVE "base_and_ancestors" AS (
              SELECT "namespaces".*
              FROM "namespaces"
              WHERE "namespaces"."type" IN ('Group')
                AND "namespaces"."id" = 1779761
              
              UNION
              
              SELECT "namespaces".*
              FROM "namespaces"
                ,"base_and_ancestors"
              WHERE "namespaces"."type" IN ('Group')
                AND "namespaces"."id" = "base_and_ancestors"."parent_id"
              )
          SELECT "id"
          FROM "base_and_ancestors" AS "namespaces"
          )
        AND (
          "members"."user_id" NOT IN (
            SELECT "users"."id"
            FROM "users"
            INNER JOIN "members" ON "users"."id" = "members"."user_id"
            WHERE "members"."source_type" = 'Namespace'
              AND "members"."source_id" = 2763806
              AND "members"."source_type" = 'Namespace'
              AND "members"."type" IN ('GroupMember')
              AND "members"."requested_at" IS NULL
            )
          )
      )
    )
ORDER BY users.name ASC NULLS LAST LIMIT 50 OFFSET 0;
 Limit  (cost=440775.47..440775.60 rows=50 width=180) (actual time=4168.805..4168.812 rows=7 loops=1)
   Output: members.id, members.access_level, members.source_id, members.source_type, members.user_id, members.notification_level, members.type, members.created_at, members.updated_at, members.created_by_id, members.invite_email, members.invite_token, members.invite_accepted_at, members.requested_at, members.expires_at, members.ldap, members.override, users.name
   Buffers: shared hit=1166717 dirtied=6, temp read=2437 written=9087
   ->  Sort  (cost=440775.47..441228.71 rows=181296 width=180) (actual time=4168.802..4168.805 rows=7 loops=1)
         Output: members.id, members.access_level, members.source_id, members.source_type, members.user_id, members.notification_level, members.type, members.created_at, members.updated_at, members.created_by_id, members.invite_email, members.invite_token, members.invite_accepted_at, members.requested_at, members.expires_at, members.ldap, members.override, users.name
         Sort Key: users.name
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=1166717 dirtied=6, temp read=2437 written=9087
         ->  Hash Left Join  (cost=223682.27..434752.95 rows=181296 width=180) (actual time=3304.853..4168.672 rows=7 loops=1)
               Output: members.id, members.access_level, members.source_id, members.source_type, members.user_id, members.notification_level, members.type, members.created_at, members.updated_at, members.created_by_id, members.invite_email, members.invite_token, members.invite_accepted_at, members.requested_at, members.expires_at, members.ldap, members.override, users.name
               Hash Cond: (members.user_id = users.id)
               Buffers: shared hit=1166714 dirtied=6, temp read=2437 written=9087
               ->  Index Scan using index_members_on_source_id_and_source_type on public.members  (cost=618.61..189639.43 rows=181296 width=168) (actual time=1043.669..1836.099 rows=7 loops=1)
                     Output: members.id, members.access_level, members.source_id, members.source_type, members.user_id, members.notification_level, members.type, members.created_at, members.updated_at, members.created_by_id, members.invite_email, members.invite_token, members.invite_accepted_at, members.requested_at, members.expires_at, members.ldap, members.override
                     Index Cond: ((members.source_type)::text = 'Namespace'::text)
                     Filter: (((members.type)::text = 'GroupMember'::text) AND ((hashed SubPlan 1) OR (hashed SubPlan 4)))
                     Rows Removed by Filter: 1155974
                     Buffers: shared hit=1004883 dirtied=6
                     SubPlan 1
                       ->  Index Scan using index_members_on_source_id_and_source_type on public.members members_1  (cost=0.43..4.47 rows=1 width=4) (actual time=0.038..0.046 rows=2 loops=1)
                             Output: members_1.id
                             Index Cond: ((members_1.source_id = 2763806) AND ((members_1.source_type)::text = 'Namespace'::text))
                             Filter: ((members_1.requested_at IS NULL) AND ((members_1.type)::text = 'GroupMember'::text))
                             Buffers: shared hit=5
                     SubPlan 4
                       ->  Nested Loop  (cost=360.92..613.68 rows=5 width=4) (actual time=0.204..0.252 rows=5 loops=1)
                             Output: members_3.id
                             Buffers: shared hit=26
                             ->  HashAggregate  (cost=352.55..352.76 rows=21 width=4) (actual time=0.052..0.052 rows=1 loops=1)
                                   Output: namespaces_2.id
                                   Group Key: namespaces_2.id
                                   Buffers: shared hit=4
                                   ->  CTE Scan on base_and_ancestors namespaces_2  (cost=351.87..352.29 rows=21 width=4) (actual time=0.045..0.050 rows=1 loops=1)
                                         Output: namespaces_2.id
                                         Buffers: shared hit=4
                                         CTE base_and_ancestors
                                           ->  Recursive Union  (cost=0.43..351.87 rows=21 width=183) (actual time=0.043..0.047 rows=1 loops=1)
                                                 Buffers: shared hit=4
                                                 ->  Index Scan using namespaces_pkey on public.namespaces  (cost=0.43..3.45 rows=1 width=183) (actual time=0.037..0.038 rows=1 loops=1)
                                                       Output: namespaces.id, namespaces.name, namespaces.path, namespaces.owner_id, namespaces.created_at, namespaces.updated_at, namespaces.type, namespaces.description, namespaces.avatar, namespaces.membership_lock, namespaces.share_with_group_lock, namespaces.visibility_level, namespaces.request_access_enabled, namespaces.ldap_sync_status, namespaces.ldap_sync_error, namespaces.ldap_sync_last_update_at, namespaces.ldap_sync_last_successful_update_at, namespaces.ldap_sync_last_sync_at, namespaces.lfs_enabled, namespaces.description_html, namespaces.parent_id, namespaces.shared_runners_minutes_limit, namespaces.repository_size_limit, namespaces.require_two_factor_authentication, namespaces.two_factor_grace_period, namespaces.cached_markdown_version, namespaces.plan_id, namespaces.project_creation_level
                                                       Index Cond: (namespaces.id = 1779761)
                                                       Filter: ((namespaces.type)::text = 'Group'::text)
                                                       Buffers: shared hit=4
                                                 ->  Nested Loop  (cost=0.43..34.80 rows=2 width=183) (actual time=0.003..0.003 rows=0 loops=1)
                                                       Output: namespaces_1.id, namespaces_1.name, namespaces_1.path, namespaces_1.owner_id, namespaces_1.created_at, namespaces_1.updated_at, namespaces_1.type, namespaces_1.description, namespaces_1.avatar, namespaces_1.membership_lock, namespaces_1.share_with_group_lock, namespaces_1.visibility_level, namespaces_1.request_access_enabled, namespaces_1.ldap_sync_status, namespaces_1.ldap_sync_error, namespaces_1.ldap_sync_last_update_at, namespaces_1.ldap_sync_last_successful_update_at, namespaces_1.ldap_sync_last_sync_at, namespaces_1.lfs_enabled, namespaces_1.description_html, namespaces_1.parent_id, namespaces_1.shared_runners_minutes_limit, namespaces_1.repository_size_limit, namespaces_1.require_two_factor_authentication, namespaces_1.two_factor_grace_period, namespaces_1.cached_markdown_version, namespaces_1.plan_id, namespaces_1.project_creation_level
                                                       ->  WorkTable Scan on base_and_ancestors  (cost=0.00..0.20 rows=10 width=4) (actual time=0.001..0.001 rows=1 loops=1)
                                                             Output: base_and_ancestors.id, base_and_ancestors.name, base_and_ancestors.path, base_and_ancestors.owner_id, base_and_ancestors.created_at, base_and_ancestors.updated_at, base_and_ancestors.type, base_and_ancestors.description, base_and_ancestors.avatar, base_and_ancestors.membership_lock, base_and_ancestors.share_with_group_lock, base_and_ancestors.visibility_level, base_and_ancestors.request_access_enabled, base_and_ancestors.ldap_sync_status, base_and_ancestors.ldap_sync_error, base_and_ancestors.ldap_sync_last_update_at, base_and_ancestors.ldap_sync_last_successful_update_at, base_and_ancestors.ldap_sync_last_sync_at, base_and_ancestors.lfs_enabled, base_and_ancestors.description_html, base_and_ancestors.parent_id, base_and_ancestors.shared_runners_minutes_limit, base_and_ancestors.repository_size_limit, base_and_ancestors.require_two_factor_authentication, base_and_ancestors.two_factor_grace_period, base_and_ancestors.cached_markdown_version, base_and_ancestors.plan_id, base_and_ancestors.project_creation_level
                                                       ->  Index Scan using namespaces_pkey on public.namespaces namespaces_1  (cost=0.43..3.45 rows=1 width=183) (actual time=0.001..0.001 rows=0 loops=1)
                                                             Output: namespaces_1.id, namespaces_1.name, namespaces_1.path, namespaces_1.owner_id, namespaces_1.created_at, namespaces_1.updated_at, namespaces_1.type, namespaces_1.description, namespaces_1.avatar, namespaces_1.membership_lock, namespaces_1.share_with_group_lock, namespaces_1.visibility_level, namespaces_1.request_access_enabled, namespaces_1.ldap_sync_status, namespaces_1.ldap_sync_error, namespaces_1.ldap_sync_last_update_at, namespaces_1.ldap_sync_last_successful_update_at, namespaces_1.ldap_sync_last_sync_at, namespaces_1.lfs_enabled, namespaces_1.description_html, namespaces_1.parent_id, namespaces_1.shared_runners_minutes_limit, namespaces_1.repository_size_limit, namespaces_1.require_two_factor_authentication, namespaces_1.two_factor_grace_period, namespaces_1.cached_markdown_version, namespaces_1.plan_id, namespaces_1.project_creation_level
                                                             Index Cond: (namespaces_1.id = base_and_ancestors.parent_id)
                                                             Filter: ((namespaces_1.type)::text = 'Group'::text)
                             ->  Index Scan using index_members_on_source_id_and_source_type on public.members members_3  (cost=8.37..12.41 rows=1 width=8) (actual time=0.150..0.194 rows=5 loops=1)
                                   Output: members_3.id, members_3.access_level, members_3.source_id, members_3.source_type, members_3.user_id, members_3.notification_level, members_3.type, members_3.created_at, members_3.updated_at, members_3.created_by_id, members_3.invite_email, members_3.invite_token, members_3.invite_accepted_at, members_3.requested_at, members_3.expires_at, members_3.ldap, members_3.override
                                   Index Cond: ((members_3.source_id = namespaces_2.id) AND ((members_3.source_type)::text = 'Namespace'::text))
                                   Filter: ((members_3.requested_at IS NULL) AND (NOT (hashed SubPlan 2)) AND ((members_3.type)::text = 'GroupMember'::text))
                                   Rows Removed by Filter: 1
                                   Buffers: shared hit=22
                                   SubPlan 2
                                     ->  Nested Loop  (cost=0.86..7.93 rows=1 width=4) (actual time=0.050..0.082 rows=2 loops=1)
                                           Output: users_1.id
                                           Buffers: shared hit=13
                                           ->  Index Scan using index_members_on_source_id_and_source_type on public.members members_2  (cost=0.43..4.47 rows=1 width=4) (actual time=0.003..0.005 rows=2 loops=1)
                                                 Output: members_2.id, members_2.access_level, members_2.source_id, members_2.source_type, members_2.user_id, members_2.notification_level, members_2.type, members_2.created_at, members_2.updated_at, members_2.created_by_id, members_2.invite_email, members_2.invite_token, members_2.invite_accepted_at, members_2.requested_at, members_2.expires_at, members_2.ldap, members_2.override
                                                 Index Cond: ((members_2.source_id = 2763806) AND ((members_2.source_type)::text = 'Namespace'::text))
                                                 Filter: ((members_2.requested_at IS NULL) AND ((members_2.type)::text = 'GroupMember'::text))
                                                 Buffers: shared hit=5
                                           ->  Index Only Scan using users_pkey on public.users users_1  (cost=0.43..3.45 rows=1 width=4) (actual time=0.036..0.037 rows=1 loops=2)
                                                 Output: users_1.id
                                                 Index Cond: (users_1.id = members_2.user_id)
                                                 Heap Fetches: 0
                                                 Buffers: shared hit=8
               ->  Hash  (cost=184192.63..184192.63 rows=2236163 width=16) (actual time=2258.672..2258.672 rows=2075304 loops=1)
                     Output: users.name, users.id
                     Buckets: 524288  Batches: 16  Memory Usage: 10448kB
                     Buffers: shared hit=161831, temp written=9068
                     ->  Seq Scan on public.users  (cost=0.00..184192.63 rows=2236163 width=16) (actual time=0.023..1380.273 rows=2075304 loops=1)
                           Output: users.name, users.id
                           Buffers: shared hit=161831
 Planning time: 3.202 ms
 Execution time: 4169.192 ms
(80 rows)
Edited Jun 26, 2025 by 🤖 GitLab Bot 🤖
Assignee Loading
Time tracking Loading