Skip to content

Schedule CronJob to refresh assigned users

Bishwa Hang Rai requested to merge 424758-add-cron-job into master

What does this MR do and why?

This MR schedules CronJob to refresh assigned users of all add_on_purchases in bulk. Some domain context below 👇 :

Seat Assignment Overview

Customers with paid subscription can buy Code Suggestions AddOn with certain seats quantity for the root/top-level namespace. The admin/owner of the group can then assign the purchased seats to eligible users of the group. Eligible users currently means any user with at least GUEST role, which belongs to the root namespace hierarchy via subgroups, projects memberships, or invited to a group or project.

Handling of removal of memberships

During the implementation, we found out that we might want to automatically free the assigned user seats, if user become ineligible for the seat assignment. This can happen if User don't have at least GUEST role anymore to the root namespace because:

  1. Users were removed as member from a group or project.
  2. The project/group invite were removed

During the implementation of the handling of namespace membership issue, we enqueued an async job to do the post membership destroy cleanup which would remove the user assignment, if ineligible.

  1. !130751 (merged)
  2. !130947 (merged)

As a follow up, we wanted to schedule a CronJob so that it will remove any ineligible seat assignments, that may have been missed by above handling because of some Sidekiq errors, retry exhaustion, or lost of jobs. [Reference MR comment: !130751 (comment 1546821006))

Note: The scheduled cron job will also help with handling of other scenarios such as when user gets blocked, banned, deactivated, or banned on namespace, though with some hours delay. We have an issue for it: https://gitlab.com/gitlab-org/gitlab/-/issues/423977#note_1567426602.

This MR implements the Cron Job

We refresh assigned_users of stale add_on_purchases every 12 hours via scheduled Cron Job.

We are also using LimitedCapacityWorker. Based on the last_assigned_users_refreshed_at, we get the first AddOnPurchase with code_suggestions add-on which hasn't been refreshed in last 8 hours.

        GitlabSubscriptions::AddOnPurchase.transaction do
          add_on_purchase = add_on_purchases_requiring_refresh
            .order('last_assigned_users_refreshed_at ASC NULLS FIRST')
            .lock('FOR UPDATE SKIP LOCKED')
            .first

We have MAX_RUNNING_JOBS = 10. This means we will have 10 workers max, polling the stale add_on_purchase and refreshing it. The worker automatically schedules itself on completion, as long as remaining_work_count is greater than 0.

An existing worker as reference, GitLabSubscriptions::RefreshSeatWorker, which does similar job for paid subscriptions.

Note: Basic analysis of worker execution can be found in google doc.

How to set up and validate locally

  1. Check out this branch
  2. Create a new root group namespace
  3. Setup some seed records
namespace = Namespace.last
add_on = GitlabSubscriptions::AddOn.find_or_create_by!(name: "code_suggestions") {|e| e.description = "Test"}

# create new add_on_purchase 
add_on_purchase = GitlabSubscriptions::AddOnPurchase.create!(
  add_on: add_on, namespace: namespace, expires_on: 1.month.from_now, quantity: 5, purchase_xid: 'A-S0001'
)

user_1 = User.find 69 # John Doe6
user_2 = User.find 70 # John Doe7

# add user as guest
namespace.add_guest(user_2)

# assign seat to the user
add_on_purchase.assigned_users.create(user: user_1)
add_on_purchase.assigned_users.create(user: user_2)

# enable the feature flag
Feature.enable(:hamilton_seat_management)

# check if SaaS, if not use GITLAB_SIMULATE_SAAS=1 gdk rails console
::Gitlab::CurrentSettings.should_check_namespace_plan? # true

# restart GDK background-job server, just in case
GITLAB_SIMULATE_SAAS=1 gdk restart rails-background-jobs

# check the current assigned_users count
add_on_purchase.assigned_users.count # 2

# Run the limited capacity worker, it should multiple jobs
# You can run gdk tail -F rails-background-jobs in separate console to follow logs
GitlabSubscriptions::AddOnPurchases::BulkRefreshUserAssignmentsWorker.perform_with_capacity

# check the job was executed correctly: user_2 should still has its seat as it is eligible
add_on_purchase.assigned_users.count # 1
add_on_purchase.last_assigned_users_refreshed_at # not null

Migration

bin/rails db:migrate:main
main: == [advisory_lock_connection] object_id: 226920, pg_backend_pid: 49311
main: == 20230926105908 AddIndexToAddOnPurchasesOnLastAssignedUsersRefreshedAtAndAddOnId: migrating
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0920s
main: -- index_exists?(:subscription_add_on_purchases, [:last_assigned_users_refreshed_at], {:order=>{:last_assigned_users_refreshed_at=>"DESC NULLS LAST"}, :name=>"idx_addon_purchases_on_last_refreshed_at_desc_nulls_last", :algorithm=>:concurrently})
main:    -> 0.0029s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0001s
main: -- add_index(:subscription_add_on_purchases, [:last_assigned_users_refreshed_at], {:order=>{:last_assigned_users_refreshed_at=>"DESC NULLS LAST"}, :name=>"idx_addon_purchases_on_last_refreshed_at_desc_nulls_last", :algorithm=>:concurrently})
main:    -> 0.4162s
main: -- execute("RESET statement_timeout")
main:    -> 0.0003s
main: == 20230926105908 AddIndexToAddOnPurchasesOnLastAssignedUsersRefreshedAtAndAddOnId: migrated (0.5272s)

main: == [advisory_lock_connection] object_id: 226920, pg_backend_pid: 49311


 bin/rails db:rollback:main
main: == [advisory_lock_connection] object_id: 226760, pg_backend_pid: 36223
main: == 20230926105908 AddIndexToAddOnPurchasesOnLastAssignedUsersRefreshedAtAndAddOnId: reverting
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.1125s
main: -- indexes(:subscription_add_on_purchases)
main:    -> 0.0039s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0001s
main: -- remove_index(:subscription_add_on_purchases, {:algorithm=>:concurrently, :name=>"idx_addon_purchases_on_last_refreshed_at_desc_nulls_last"})
main:    -> 0.9456s
main: -- execute("RESET statement_timeout")
main:    -> 0.0016s
main: == 20230926105908 AddIndexToAddOnPurchasesOnLastAssignedUsersRefreshedAtAndAddOnId: reverted (1.0938s)

SQL


# fetch next subscription_add_on_purchase to refresh
SELECT "subscription_add_on_purchases".* FROM "subscription_add_on_purchases" WHERE "subscription_add_on_purchases"."subscription_add_on_id" = 1 AND (last_assigned_users_refreshed_at < '2023-09-25 13:36:24.868401' OR last_assigned_users_refreshed_at is NULL) ORDER BY last_assigned_users_refreshed_at ASC NULLS FIRST LIMIT 1 FOR UPDATE SKIP LOCKED

# update the last_assigned_users_refreshed_at timestamp

UPDATE "subscription_add_on_purchases" SET "last_assigned_users_refreshed_at" = '2023-09-26 11:20:31.730530' WHERE "subscription_add_on_purchases"."id" = 11

# fetch remaining work count
SELECT COUNT(*) FROM (SELECT 1 AS one FROM "subscription_add_on_purchases" WHERE "subscription_add_on_purchases"."subscription_add_on_id" = 1 AND (last_assigned_users_refreshed_at < '2023-09-25 13:36:24.938278' OR last_assigned_users_refreshed_at is NULL) LIMIT 11)

Explain:

  1. fetch next add_on_purchase: a. Index scan back, b. BitMapOr, c. more comment context
  2. update last_assigned_users_refreshed_at: https://explain-depesz.postgres.ai/s/e6
  3. get remaining_work_count: https://explain-depesz.postgres.ai/s/ce

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #424758

Edited by Bishwa Hang Rai

Merge request reports