Skip to content

Add OAuth access grant archival worker for data retention

What does this MR do and why?

Add OAuth access grant archival worker for data retention

Introduce a new cron worker to handle periodic archival and cleanup of revoked OAuth access grants. This worker will run on a scheduled basis to move old revoked grants to the archive table and delete them from the original table.

  • Add OauthAccessGrantArchiveWorker for scheduled cleanup
  • Configure worker in all_queues.yml for Sidekiq processing
  • Add ops feature flag to control worker execution
  • Update cron settings in initializers for scheduling
  • Include comprehensive worker specs for testing

This is part 2 of the OAuth grant cleanup implementation, focusing on the automated background processing infrastructure.

Related: #377995 (closed)
Changelog: performance

References

Most relevant

Main issue: #377995 (closed)
Feature flag rollout issue: https://gitlab.com/gitlab-org/gitlab/-/issues/574827

Additional resources

Plan: #555382
Part 1: !204337 (merged)
Previous results from BBM: #377995 (comment 2790731710)
Query plan: !207837 (comment 2807127361)

Queries

Batch Finding Queries

-- 1. Initial batch query
SELECT "oauth_access_grants"."id" 
FROM "oauth_access_grants" 
WHERE (id > 0) 
ORDER BY "oauth_access_grants"."id" ASC 
LIMIT 1

-- 2. Finding next batch boundary 
SELECT "oauth_access_grants"."id" 
FROM "oauth_access_grants" 
WHERE (id > 0) 
  AND "oauth_access_grants"."id" >= 80003 
ORDER BY "oauth_access_grants"."id" ASC 
LIMIT 1 
OFFSET 10000

-- 3. Sub-batch Start Check
SELECT "oauth_access_grants"."id" 
FROM "oauth_access_grants" 
WHERE (id > 0) 
  AND "oauth_access_grants"."id" >= 80003 
  AND "oauth_access_grants"."id" < 90003 
ORDER BY "oauth_access_grants"."id" ASC 
LIMIT 1

-- 4. Sub-batch queries (OFFSET 1000 for SUB_BATCH_SIZE)
SELECT "oauth_access_grants"."id" 
FROM "oauth_access_grants" 
WHERE (id > 0) 
  AND "oauth_access_grants"."id" >= 80003 
  AND "oauth_access_grants"."id" < 90003 
  AND "oauth_access_grants"."id" >= 80003 
ORDER BY "oauth_access_grants"."id" ASC 
LIMIT 1 
OFFSET 1000

Archive Operation CTE

WITH deleted AS (
  DELETE FROM oauth_access_grants
  WHERE id IN (
    SELECT "oauth_access_grants"."id" 
    FROM "oauth_access_grants" 
    WHERE (id > 0) 
      AND "oauth_access_grants"."id" >= 80003 
      AND "oauth_access_grants"."id" < 90003 
      AND "oauth_access_grants"."id" >= 80003 
      AND "oauth_access_grants"."id" < 81003 
      AND "oauth_access_grants"."revoked_at" <= '2025-09-07 00:00:00'
  )
  RETURNING *
)
INSERT INTO oauth_access_grant_archived_records
  (id, resource_owner_id, application_id, token, expires_in, redirect_uri,
   revoked_at, created_at, scopes, organization_id, code_challenge,
   code_challenge_method, archived_at)
SELECT
  id, resource_owner_id, application_id, token, expires_in, redirect_uri,
  revoked_at, created_at, scopes, organization_id, code_challenge,
  code_challenge_method, CURRENT_TIMESTAMP as archived_at
FROM deleted

Reversibility and Recovery

  • The operations performed by the worker is not automatically reversible since it performs DELETE operations.
    • Migration can be stopped at any point with only processed batches affected
    • The CTE approach ensures atomicity per batch: tokens are only deleted if successfully archived.
    • Unprocessed tokens remain intact

Data recovery options are the following.

Full Recovery (from archive table)

INSERT INTO oauth_access_grants 
(id, resource_owner_id, application_id, token, expires_in, 
          redirect_uri, created_at, revoked_at, scopes, code_challenge,
          code_challenge_method, organization_id)
   SELECT id, resource_owner_id, application_id, token, expires_in, 
          redirect_uri, created_at, revoked_at, scopes, code_challenge,
          code_challenge_method, organization_id
   FROM oauth_access_grant_archived_records
   WHERE archived_at >= [migration_start_timestamp];

Selective Recovery (specific tokens)

INSERT INTO oauth_access_grants 
(id, resource_owner_id, application_id, token, expires_in, 
          redirect_uri, created_at, revoked_at, scopes, code_challenge,
          code_challenge_method, organization_id)
   SELECT id, resource_owner_id, application_id, token, expires_in, 
          redirect_uri, created_at, revoked_at, scopes, code_challenge,
          code_challenge_method, organization_id
   FROM oauth_access_grant_archived_records
   WHERE id BETWEEN 73800000 and 73801000;

Important notes

The worker will perform actions that we have already successfully tested with a BBM.
Results are available here.

How to set up and validate locally

  • Create some oauth access grants with revoked_at from 2 months ago
  • Enqueue and run the Authn::DataRetention::OauthAccessGrantArchiveWorker and see the tokens are correctly removed

Is possible to seed some tokens with the current rake task:

bundle exec rake 'db:seed:oauth_grants[1]'

# lib/tasks/seed_oauth_grants.rake
namespace :db do
  namespace :seed do
    desc "Seed millions of OAuth access grants for testing archival"
    task :oauth_grants, [:count_millions] => :environment do |_task, args|
      count_millions = (args[:count_millions] || 200).to_i
      # total_records = count_millions * 1_000_000
      total_records = count_millions * 20_000

      puts "Seeding #{count_millions} million OAuth grants (#{total_records} records)"
      puts "This will take a while..."

      # Get sample data
      organization_ids = Organizations::Organization.limit(10).pluck(:id)
      application_ids = Authn::OauthApplication.limit(10).pluck(:id)
      user_ids = User.limit(100).pluck(:id)

      batch_size = 10_000
      iterations = total_records / batch_size

      start_time = Time.current

      iterations.times do |i|
        values = []

        batch_size.times do
          created_at = rand(2.years.ago..Time.current)
          is_revoked = rand < 0.87  # 87% revoked to match your production

          values << {
            organization_id: organization_ids.sample,
            application_id: application_ids.sample,
            resource_owner_id: user_ids.sample,
            token: SecureRandom.hex(16),  # Grants typically have shorter tokens
            expires_in: [300, 600, 900].sample,  # Grants expire faster (5-15 minutes)
            redirect_uri: "https://example.com/callback/#{SecureRandom.hex(4)}",
            scopes: ['read', 'write', 'read write', 'api'].sample,
            code_challenge: rand < 0.7 ? SecureRandom.hex(32) : nil,  # 70% use PKCE
            code_challenge_method: rand < 0.7 ? 'S256' : nil,
            created_at: created_at,
            revoked_at: is_revoked ? created_at + rand(1.minute..1.hour) : nil  # Grants revoked faster
          }
        end

        # Use raw SQL for maximum speed
        sql = <<~SQL
          INSERT INTO oauth_access_grants
            (organization_id, application_id, resource_owner_id, token,
             expires_in, redirect_uri, scopes, code_challenge, 
             code_challenge_method, created_at, revoked_at)
          VALUES #{values.map { |v|
            "(#{v[:organization_id]}, #{v[:application_id]}, #{v[:resource_owner_id] || 'NULL'},
             '#{v[:token]}', #{v[:expires_in]}, '#{v[:redirect_uri]}',
             '#{v[:scopes]}', #{v[:code_challenge] ? "'#{v[:code_challenge]}'" : 'NULL'},
             #{v[:code_challenge_method] ? "'#{v[:code_challenge_method]}'" : 'NULL'},
             '#{v[:created_at]}', #{v[:revoked_at] ? "'#{v[:revoked_at]}'" : 'NULL'})"
          }.join(',')}
        SQL

        ActiveRecord::Base.connection.execute(sql)

        if (i + 1) % 100 == 0
          progress = ((i + 1) * batch_size / total_records.to_f * 100).round(2)
          elapsed = Time.current - start_time
          rate = (i + 1) * batch_size / elapsed
          eta = (total_records - (i + 1) * batch_size) / rate

          puts "Progress: #{(i + 1) * batch_size}/#{total_records} (#{progress}%) - " \
               "Rate: #{rate.round}/sec - ETA: #{(eta / 60).round} minutes"
        end
      end

      puts "\nSeeding complete!"
      puts "Total time: #{((Time.current - start_time) / 60).round(2)} minutes"

      # Show statistics
      stats = ActiveRecord::Base.connection.select_one(<<~SQL)
        SELECT
          COUNT(*) as total,
          COUNT(*) FILTER (WHERE revoked_at IS NOT NULL) as revoked,
          COUNT(*) FILTER (WHERE revoked_at IS NULL) as active,
          COUNT(*) FILTER (WHERE revoked_at < '#{1.month.ago}') as archivable
        FROM oauth_access_grants
      SQL

      puts "\nDatabase statistics:"
      puts "Total grants: #{stats['total']}"
      puts "Revoked: #{stats['revoked']} (#{(stats['revoked'].to_f / stats['total'] * 100).round(1)}%)"
      puts "Active: #{stats['active']} (#{(stats['active'].to_f / stats['total'] * 100).round(1)}%)"
      puts "Ready for archival: #{stats['archivable']}"
    end
  end
end

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Daniele Bracciani

Merge request reports

Loading