High database load due to queries due to LDAP identities
- ZenDesk: https://gitlab.zendesk.com/agent/tickets/117082
- Data: https://docs.google.com/spreadsheets/d/16pNToQ51-479lz-oNfItwZCMofbMqpV83-i5xQUdopI/edit#gid=1662014526
It looks like the following query dominates their usage:
SELECT "identities".* FROM "identities" WHERE (LOWER("identities"."extern_uid") = LOWER(?)) AND "identities"."provider" = ? LIMIT ?
As @NikolayS points out, we don't have an index on extern_uid
and on the provider
to fix this.
I've generated some test data via:
require 'securerandom'
output = File.open("/tmp/identities.csv", 'w')
(1..3000000).each do |index|
output.write("\"uid=#{SecureRandom.hex},ou=people,dc=example,dc=edu\",ldapmain,#{index},#{Time.now},#{Time.now}\n")
end
I imported this via COPY:
# RAILS_ENV=test bin/rails dbconsole
gitlabhq_test=# copy identities(extern_uid,provider,user_id,created_at,updated_at) FROM '/tmp/identities.csv' delimiter ',' CSV;
COPY 3000000
Adding an index on extern_uid
alone isn't sufficient. We should add a partial index:
gitlabhq_test=# explain analyze select identities from identities where lower(extern_uid) = lower('test') and provider = 'ldapmain' limit 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..6.78 rows=1 width=122) (actual time=1085.405..1085.405 rows=0 loops=1)
-> Seq Scan on identities (cost=0.00..101681.00 rows=15000 width=122) (actual time=1085.405..1085.405 rows=0 loops=1)
Filter: (((provider)::text = 'ldapmain'::text) AND (lower((extern_uid)::text) = 'test'::text))
Rows Removed by Filter: 3000000
Planning time: 1.473 ms
Execution time: 1085.429 ms
(6 rows)
This works:
gitlabhq_test=# CREATE INDEX CONCURRENTLY index_on_identities_lower_extern_uid_and_lower_provider ON identities (LOWER(extern_uid), LOWER(provider));
CREATE INDEX
gitlabhq_test=# explain analyze select identities from identities where lower(extern_uid) = lower('test') and provider = 'ldapmain' limit 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.56..4.10 rows=1 width=122) (actual time=0.034..0.034 rows=0 loops=1)
-> Index Scan using index_on_identities_lower_extern_uid_and_lower_provider on identities (cost=0.56..53136.56 rows=15000 width=122) (actual time=0.034..0.034 rows=0 loops=1)
Index Cond: (lower((extern_uid)::text) = 'test'::text)
Filter: ((provider)::text = 'ldapmain'::text)
Planning time: 0.291 ms
Execution time: 0.049 ms
(6 rows)
/cc: @dblessing
Edited by Stan Hu