Skip to content

Automate deactivation of dormant users

What does this MR do?

This MR adds new cron worker to automate deactivation of dormant users.

This new worker runs daily, and if new application setting deactivate_dormant_users is enabled deactivates dormant users - those with no activity at all or with no activity in the last 90 days.

As per the existing code, deactivated users will be activated on their next sign in.

There is no UI for the new application setting yet.

The intention is that this will never be enabled for SaaS.

Related to #211754 (closed).

What is a dormant user?

Dormant users are users that have no recent activity and are considered OK to be deactivated. Currently we define this in User#can_be_deactivated?:

def can_be_deactivated?
  active? && no_recent_activity? && !internal?
end 

If we break this down

Both these are already combined in the User.active scope (https://gitlab.com/gitlab-org/gitlab/-/blob/8a31f66b5633ec593941e39364425edf13e6592b/app/models/user.rb#L370):

scope :active, -> { with_state(:active).non_internal }

User#no_recent_activity? is a bit trickier. Currently it is defined as:

def no_recent_activity?
  last_active_at.to_i <= MINIMUM_INACTIVE_DAYS.days.ago.to_i
end

where last_active_at is not a database column, but another method on User (https://gitlab.com/gitlab-org/gitlab/-/blob/8a31f66b5633ec593941e39364425edf13e6592b/app/models/user.rb#L1840-1845):

def last_active_at
  last_activity = last_activity_on&.to_time&.in_time_zone
  last_sign_in = current_sign_in_at

  [last_activity, last_sign_in].compact.max
end

If we want to express this in SQL it gets a bit complicated to make it in performant way - first we need to take care for each of last_activity_on and current_sign_in_at having a value or being NULL, and then we have to compare against the one having bigger value (i.e. being more recent). Unfortunately we can't just create a index on GREATEST(last_activity_on, current_sign_in_at) because these two columns are of different type - date and timestamp with timezone.

However, it turns out that taking into account current_sign_in_at is not needed - whenever users signs in (which updates current_sign_in_at), last_activity_on is also updated (because the sign-in request is POST).

Example:

Let's look at an user without sign-in:

gitlabhq_development=# select id, username, state, user_type, last_activity_on, current_sign_in_at, updated_at from users where id = 19;
 id | username | state  | user_type | last_activity_on | current_sign_in_at |         updated_at         
----+----------+--------+-----------+------------------+--------------------+----------------------------
 19 | ressie   | active |           |                  |                    | 2021-04-06 04:26:50.980647
(1 row)

Time: 5.785 ms

This is what we see in the logs for the sign-in request:

Screen_Shot_2021-04-06_at_16.34.19

and if we check the database record for this user after sign-in we see

gitlabhq_development=# select id, username, state, user_type, last_activity_on, current_sign_in_at, updated_at from users where id = 19;
 id | username | state  | user_type | last_activity_on |     current_sign_in_at     |         updated_at         
----+----------+--------+-----------+------------------+----------------------------+----------------------------
 19 | ressie   | active |           | 2021-04-06       | 2021-04-06 04:31:35.345049 | 2021-04-06 04:31:36.647626
(1 row)

Time: 0.837 ms

All of the above means that we can ignore current_sign_in_at and express no recent activity as where('last_activity_on <= ?', MINIMUM_INACTIVE_DAYS.day.ago.to_date).

Once we combine all of the above we end up with scope like

scope :dormant, -> { active.where('last_activity_on <= ?', MINIMUM_INACTIVE_DAYS.day.ago.to_date) }

and this can be supported very well with partial index on users (id, last_activity_on) WHERE (((state)::text = 'active'::text) AND ((user_type IS NULL) OR (user_type = ANY (ARRAY[NULL::integer, 6, 4])))).

In order to match the current definition of no_recent_activity? and cover users with no activity at all we should do something like

last_activity_on <= ?
OR last_activity_on IS NULL

As we know OR queries do not perform well - https://docs.gitlab.com/ee/development/sql.html#use-unions. The recommended approach is to use UNION.

This way we end up with query like

UPDATE "users"
SET "state" = 'deactivated'
WHERE "users"."id" IN (
  (SELECT "users"."id" FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (NULL, 6, 4)) AND (last_activity_on <= '2021-01-21') ORDER BY users.id ASC LIMIT 200)
  UNION
  (SELECT "users"."id" FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (NULL, 6, 4)) AND "users"."last_activity_on" IS NULL ORDER BY users.id ASC LIMIT 200)
  LIMIT 200
)

Worker is executed twice every hour from 0 through 4 (21,42 0-4 * * *) This locks the maximum number of users to be deactivated per day to 100_000 (50 * 200 * 2 * 5).

Database Migrations

Up

$ bundle exec rails db:migrate:up VERSION=20210421021510
== 20210421021510 AddDeactivateDormantUsersToApplicationSettings: migrating ===
-- add_column(:application_settings, :deactivate_dormant_users, :boolean, {:default=>false, :null=>false})
   -> 0.0054s
== 20210421021510 AddDeactivateDormantUsersToApplicationSettings: migrated (0.0054s) 

$ bundle exec rails db:migrate:up VERSION=20210421022010
== 20210421022010 AddIndexForDormantUsers: migrating ==========================
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:users, [:id, :last_activity_on], {:where=>"state = 'active' AND (users.user_type IS NULL OR users.user_type IN (NULL, 6, 4))", :name=>"index_users_on_id_and_last_activity_on_for_non_internal_active", :algorithm=>:concurrently})
   -> 0.0108s
-- execute("SET statement_timeout TO 0")
   -> 0.0006s
-- add_index(:users, [:id, :last_activity_on], {:where=>"state = 'active' AND (users.user_type IS NULL OR users.user_type IN (NULL, 6, 4))", :name=>"index_users_on_id_and_last_activity_on_for_non_internal_active", :algorithm=>:concurrently})
   -> 0.0225s
-- execute("RESET ALL")
   -> 0.0009s
== 20210421022010 AddIndexForDormantUsers: migrated (0.0360s) =================

Down

$ bundle exec rails db:migrate:down VERSION=20210421022010
== 20210421022010 AddIndexForDormantUsers: reverting ==========================
-- transaction_open?()
   -> 0.0000s
-- indexes(:users)
   -> 0.0121s
-- execute("SET statement_timeout TO 0")
   -> 0.0006s
-- remove_index(:users, {:algorithm=>:concurrently, :name=>"index_users_on_id_and_last_activity_on_for_non_internal_active"})
   -> 0.0062s
-- execute("RESET ALL")
   -> 0.0006s
== 20210421022010 AddIndexForDormantUsers: reverted (0.0208s) =================

$ bundle exec rails db:migrate:down VERSION=20210421021510
== 20210421021510 AddDeactivateDormantUsersToApplicationSettings: reverting ===
-- remove_column(:application_settings, :deactivate_dormant_users, :boolean, {:default=>false, :null=>false})
   -> 0.0045s
== 20210421021510 AddDeactivateDormantUsersToApplicationSettings: reverted (0.0066s) 

Index creation takes ~249s on when tested on postgres.ai:

exec CREATE INDEX CONCURRENTLY index_users_on_id_and_last_activity_on_for_non_internal_active ON users USING btree (id, last_activity_on) WHERE (((state)::text = 'active'::text) AND ((user_type IS NULL) OR (user_type = ANY (ARRAY[NULL::integer, 6, 4]))))
Session: webui-i3232

% time      seconds wait_event  
------ ------------ -----------------------------  
90.77    226.406605 IO.DataFileRead  
6.19      15.434963 Running  
1.09       2.709451 IO.DataFileExtend  
0.90       2.248166 LWLock.WALWriteLock  
0.55       1.371173 IO.BufFileWrite  
0.19       0.466658 IO.DataFileWrite  
0.09       0.223423 IO.SLRURead  
0.08       0.211910 IO.WALWrite  
0.08       0.194466 IO.BufFileRead  
0.05       0.135070 IPC.ParallelCreateIndexScan  
0.01       0.024559 LWLock.buffer_mapping  
------ ------------ -----------------------------  
100.00   249.426444  
  

The query has been executed. Duration: 249.426 s (estimated for prod: 22.169...228.373 s)

SQL queries

Here is the raw SQL query generated with its execution plans:

UPDATE "users"
SET "state" = 'deactivated'
WHERE "users"."id" IN (
  (SELECT "users"."id" FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (NULL, 6, 4)) AND (last_activity_on <= '2021-01-21') ORDER BY users.id ASC LIMIT 200)
  UNION
  (SELECT "users"."id" FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (NULL, 6, 4)) AND "users"."last_activity_on" IS NULL ORDER BY users.id ASC LIMIT 200)
  LIMIT 200
)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Mayra Cabrera

Merge request reports