Skip to content

Draft: Improve performance of dormant users deactivation

What does this MR do?

This MR builds on top of !57778 (merged) to improve deactivation of dormant users.

In the above mentioned MR we started with the following code

User.dormant.find_each do |user|
  if user.can_be_deactivated?
    with_context(user: user) { user.deactivate }
  end
end

where User.dormant scope had a naive initial definition like

scope :dormant, -> { all }

What is a dormant user?

Dormant users are users that have no recent activity and are considered OK to be deactivated. Currently we define this in User#can_be_deactivated?:

def can_be_deactivated?
  active? && no_recent_activity? && !internal?
end 

If we break this down

Both these are already combined in the User.active scope (https://gitlab.com/gitlab-org/gitlab/-/blob/8a31f66b5633ec593941e39364425edf13e6592b/app/models/user.rb#L370):

scope :active, -> { with_state(:active).non_internal }

User#no_recent_activity? is a bit trickier. Currently it is defined as:

def no_recent_activity?
  last_active_at.to_i <= MINIMUM_INACTIVE_DAYS.days.ago.to_i
end

where last_active_at is not a database column, but another method on User (https://gitlab.com/gitlab-org/gitlab/-/blob/8a31f66b5633ec593941e39364425edf13e6592b/app/models/user.rb#L1840-1845):

def last_active_at
  last_activity = last_activity_on&.to_time&.in_time_zone
  last_sign_in = current_sign_in_at

  [last_activity, last_sign_in].compact.max
end

If we want to express this in SQL it gets a bit complicated to make it in performant way - first we need to take care for each of last_activity_on and current_sign_in_at having a value or being NULL, and then we have to compare against the one having bigger value (i.e. being more recent). Unfortunately we can't just create a index on GREATEST(last_activity_on, current_sign_in_at) because these two columns are of different type - date and timestamp with timezone.

However, it turns out that taking into account current_sign_in_at is not needed - whenever users signs in (which updates current_sign_in_at), last_activity_on is also updated (because the sign-in request is POST).

Example:

Let's look at an user without sign-in:

gitlabhq_development=# select id, username, state, user_type, last_activity_on, current_sign_in_at, updated_at from users where id = 19;
 id | username | state  | user_type | last_activity_on | current_sign_in_at |         updated_at         
----+----------+--------+-----------+------------------+--------------------+----------------------------
 19 | ressie   | active |           |                  |                    | 2021-04-06 04:26:50.980647
(1 row)

Time: 5.785 ms

This is what we see in the logs for the sign-in request:

Screen_Shot_2021-04-06_at_16.34.19

and if we check the database record for this user after sign-in we see

gitlabhq_development=# select id, username, state, user_type, last_activity_on, current_sign_in_at, updated_at from users where id = 19;
 id | username | state  | user_type | last_activity_on |     current_sign_in_at     |         updated_at         
----+----------+--------+-----------+------------------+----------------------------+----------------------------
 19 | ressie   | active |           | 2021-04-06       | 2021-04-06 04:31:35.345049 | 2021-04-06 04:31:36.647626
(1 row)

Time: 0.837 ms

All of the above means that we can ignore current_sign_in_at and express no recent activity as where('last_activity_on <= ?', MINIMUM_INACTIVE_DAYS.day.ago.to_date).

Once we combine all of the above we end up with scope like

scope :dormant, -> { active.where('last_activity_on <= ?', MINIMUM_INACTIVE_DAYS.day.ago.to_date) }

and this can be supported very well with partial index on users (id, last_activity_on) WHERE (((state)::text = 'active'::text) AND ((user_type IS NULL) OR (user_type = ANY (ARRAY[NULL::integer, 6, 4])))).

In order to match the current definition of no_recent_activity? and cover users with no activity at all we should do something like

last_activity_on <= ?
OR last_activity_on IS NULL

As we know OR queries do not perform well - https://docs.gitlab.com/ee/development/sql.html#use-unions. The recommended approach is to use UNION, but this makes iteration (find_each) harder, so I think we're better if we add another scope like

 scope :with_no_activity, -> { active.where(last_activity_on: nil) }

and iterate each type of users separately.

This way we end up with worker like this

    BATCH_SIZE = 100

    def perform
      return if Gitlab.com?
      return unless ::Gitlab::CurrentSettings.current_application_settings.deactivate_dormant_users

      # rubocop: disable CodeReuse/ActiveRecord
      User.dormant.find_each(batch_size: BATCH_SIZE) { |user| deactivate(user) }
      User.with_no_activity.find_each(batch_size: BATCH_SIZE) { |user| deactivate(user) }
      # rubocop: enable CodeReuse/ActiveRecord
    end  

    private

    def deactivate(user)
      if user.can_be_deactivated?
        with_context(user: user) { user.deactivate }
      end  
    end

Database migrations

Up

$ bundle exec rails db:migrate:up VERSION=20210406030300
== 20210406030300 AddIndexForDormantUsers: migrating ==========================
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:users, [:id, :last_activity_on], {:where=>"state = 'active' AND (users.user_type IS NULL OR users.user_type IN (NULL, 6, 4))", :name=>"index_users_on_id_and_last_activity_on_for_non_internal_active", :algorithm=>:concurrently})
   -> 0.0101s
-- execute("SET statement_timeout TO 0")
   -> 0.0006s
-- add_index(:users, [:id, :last_activity_on], {:where=>"state = 'active' AND (users.user_type IS NULL OR users.user_type IN (NULL, 6, 4))", :name=>"index_users_on_id_and_last_activity_on_for_non_internal_active", :algorithm=>:concurrently})
   -> 0.0055s
-- execute("RESET ALL")
   -> 0.0006s
== 20210406030300 AddIndexForDormantUsers: migrated (0.0177s) =================

Down

$ bundle exec rails db:migrate:down VERSION=20210406030300
== 20210406030300 AddIndexForDormantUsers: reverting ==========================
-- transaction_open?()
   -> 0.0000s
-- indexes(:users)
   -> 0.0106s
-- execute("SET statement_timeout TO 0")
   -> 0.0006s
-- remove_index(:users, {:algorithm=>:concurrently, :name=>"index_users_on_id_and_last_activity_on_for_non_internal_active"})
   -> 0.0028s
-- execute("RESET ALL")
   -> 0.0006s
== 20210406030300 AddIndexForDormantUsers: reverted (0.0156s) =================

Index creation takes ~249s on when tested on postgres.ai:

exec CREATE INDEX CONCURRENTLY index_users_on_id_and_last_activity_on_for_non_internal_active ON users USING btree (id, last_activity_on) WHERE (((state)::text = 'active'::text) AND ((user_type IS NULL) OR (user_type = ANY (ARRAY[NULL::integer, 6, 4]))))
Session: webui-i3232

% time      seconds wait_event  
------ ------------ -----------------------------  
90.77    226.406605 IO.DataFileRead  
6.19      15.434963 Running  
1.09       2.709451 IO.DataFileExtend  
0.90       2.248166 LWLock.WALWriteLock  
0.55       1.371173 IO.BufFileWrite  
0.19       0.466658 IO.DataFileWrite  
0.09       0.223423 IO.SLRURead  
0.08       0.211910 IO.WALWrite  
0.08       0.194466 IO.BufFileRead  
0.05       0.135070 IPC.ParallelCreateIndexScan  
0.01       0.024559 LWLock.buffer_mapping  
------ ------------ -----------------------------  
100.00   249.426444  
  

The query has been executed. Duration: 249.426 s (estimated for prod: 22.169...228.373 s)

SQL queries

Here are the raw SQL queries generated with their execution plans:

SELECT "users".* FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (NULL, 6, 4)) AND (last_activity_on <= '2020-12-30') AND "users"."id" > 3170776 ORDER BY "users"."id" ASC LIMIT 100
SELECT "users".* FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (NULL, 6, 4)) AND (last_activity_on IS NULL) AND "users"."id" > 3170776 ORDER BY "users"."id" ASC LIMIT 100

In all cases we have Index Scan using index_users_on_id_and_last_activity_on_for_non_internal_active on public.users and records are filtered as Index Cond and not Filter.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Related to #211754 (closed)

Edited by Krasimir Angelov

Merge request reports