Skip to content

Fix timeouts on when informing users of expiring SSH keys

Max Woolf requested to merge 333080-ssh-timeouts into master

What does this MR do?

In this MR we sped up the query to inform users that their SSH keys had expired. Whilst this improvement was enough to cause the query to run successfully on staging.gitlab.com, it still timed out in production.

Database Review

Why keyset pagination

In the worker, we want to iterate over a date range (expired_at) in batches. Unfortunately EachBatch cannot iterate over timestamp columns, it needs to use a distinct column (id).

When EachBatch is used, the first query will try to find the minimum id value (EachBatch processes the records in id ASC order). This makes the database use the primary key index:

SELECT min("keys"."id") FROM "keys" WHERE (date(expires_at AT TIME ZONE 'UTC') BETWEEN '2020-01-01' AND CURRENT_DATE AND expiry_notification_delivered_at IS NULL) ;
 Result  (cost=47.23..47.24 rows=1 width=4)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..47.23 rows=1 width=4)
           ->  Index Scan using keys_pkey on keys  (cost=0.43..1063333.59 rows=22722 width=4)
                 Index Cond: (id IS NOT NULL)
                 Filter: ((expiry_notification_delivered_at IS NULL) AND (date(timezone('UTC'::text, expires_at)) >= '2020-01-01'::date) AND (date(timezone('UTC'::text, expires_at)) <= CURRENT_DATE))
(6 rows)

For large database tables, this query will time out.

Keyset pagination has similar performance characteristics as EachBatch and it can process the rows in arbitrary order. In this MR we paginate over the records ordered by date(expires_at AT TIME ZONE 'UTC') ASC, "keys"."id" ASC.

The other option would be to build a loop using kaminari to use OFFSET pagination and order by date(expires_at AT TIME ZONE 'UTC') ASC however, in the long term we might see performance problems. See the docs for more info: https://docs.gitlab.com/ee/development/database/pagination_guidelines.html#offset-on-a-large-dataset

Migration

Up

== 20210609090856 AddExpiryIdSshKeyNotificationIndex: migrating ===============
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:keys, "date(timezone('UTC', expires_at)), id", {:where=>"expiry_notification_delivered_at IS NULL", :name=>"index_keys_on_expires_at_and_id", :algorithm=>:concurrently})
   -> 0.0062s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- add_index(:keys, "date(timezone('UTC', expires_at)), id", {:where=>"expiry_notification_delivered_at IS NULL", :name=>"index_keys_on_expires_at_and_id", :algorithm=>:concurrently})
   -> 0.0152s
-- execute("RESET ALL")
   -> 0.0006s
== 20210609090856 AddExpiryIdSshKeyNotificationIndex: migrated (0.0234s) ======

Down

== 20210609090856 AddExpiryIdSshKeyNotificationIndex: reverting ===============
-- transaction_open?()
   -> 0.0000s
-- indexes(:keys)
   -> 0.0045s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- remove_index(:keys, {:algorithm=>:concurrently, :name=>"index_keys_on_expires_at_and_id"})
   -> 0.0050s
-- execute("RESET ALL")
   -> 0.0005s
== 20210609090856 AddExpiryIdSshKeyNotificationIndex: reverted (0.0113s) ======

Query

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Related to #333080 (closed)

Edited by Adam Hegyi

Merge request reports