Performance Insigths -Query Review- :: Week 35 // Query 3

SELECT  users.* FROM users WHERE users.unconfirmed_email = ? ORDER BY users.id ASC LIMIT 1

Query plan:

gitlabhq_production=# explain analyze SELECT  users.* FROM users WHERE users.unconfirmed_email = ? order by users.id ASC LIMIT 1;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..481827.85 rows=1 width=1086) (actual time=1366.567..1366.568 rows=1 loops=1)
   ->  Index Scan using users_pkey on users  (cost=0.43..481827.85 rows=1 width=1086) (actual time=1366.567..1366.567 rows=1 loops=1)
         Filter: ((unconfirmed_email)::text = 'xxxx'::text)
         Rows Removed by Filter: 955018
 Planning time: 0.352 ms
 Execution time: 1366.619 ms
(6 rows)

The unconfirmed_email column have NULL values 99% of the time:

-[ RECORD 1 ]----------------------------
schemaname             | public
tablename              | users
attname                | unconfirmed_email
inherited              | f
null_frac              | 0.99643
avg_width              | 22
n_distinct             | -0.00357002
most_common_vals       | 
most_common_freqs      | 

A partial index could take advantage of this:

CREATE INDEX users_partial_on_unconfirmed_email on users(id) WHERE unconfirmed_email is not null;

Another way to make it perform better is to remove the ORDER BY users.id ASC clause (probably not needed). That would make the planner to prefer using a seq scan. The main advantage is that the query will end when the first row that fullfills the filter condition:

gitlabhq_production=# explain analyze SELECT  users.* FROM users WHERE users.unconfirmed_email = 'single-ci@singlemusic.co' LIMIT 1;
                                                    QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..1036467.60 rows=1 width=1088) (actual time=668.637..668.639 rows=1 loops=1)
   ->  Seq Scan on users  (cost=0.00..1036467.60 rows=1 width=1088) (actual time=668.636..668.636 rows=1 loops=1)
         Filter: ((unconfirmed_email)::text = 'xxxx'::text)
         Rows Removed by Filter: 2594870
 Planning time: 0.323 ms
 Execution time: 668.692 ms
(6 rows)