Skip to content
Snippets Groups Projects

BG migration to fix incorrect job artifacts expire_at on self-managed

Merged Albert requested to merge 355833-migration-to-fix-incorrect-expire-at into master
6 unresolved threads
Compare and Show latest version
2 files
+ 82
3
Compare changes
  • Side-by-side
  • Inline
Files
2
# frozen_string_literal: true
module Gitlab
module BackgroundMigration
# This detects and fixes job artifacts that have `expire_at` wrongly backfilled by the migration
# https://gitlab.com/gitlab-org/gitlab/-/merge_requests/47723.
# These job artifacts will not be deleted and will have their `expire_at` removed.
class RemoveBackfilledJobArtifactsExpireAt < BatchedMigrationJob
# The migration would have backfilled `expire_at`
# to midnight on the 22nd of the month of the local timezone,
# storing it as UTC time in the database.
#
# If the timezone setting has changed since the migration,
# the `expire_at` stored in the database could have changed to a different local time other than midnight.
# For example:
# - changing timezone from UTC+02:00 to UTC+02:30 would change the `expire_at` in local time 00:00:00 to 00:30:00.
# - changing timezone from UTC+00:00 to UTC-01:00 would change the `expire_at` in local time 00:00:00 to 23:00:00
# on the previous day (21st).
#
# Therefore job artifacts that have `expire_at` exactly on the 00, 30 or 45 minute mark
# on the dates 21, 22, 23 of the month will not be deleted.
# https://en.wikipedia.org/wiki/List_of_UTC_time_offsets
EXPIRES_ON_21_22_23_AT_MIDNIGHT_IN_TIMEZONE = <<~SQL
EXTRACT(day FROM timezone('UTC', expire_at)) IN (21, 22, 23)
AND EXTRACT(minute FROM timezone('UTC', expire_at)) IN (0, 30, 45)
AND EXTRACT(second FROM timezone('UTC', expire_at)) = 0
SQL
def perform
each_sub_batch(
operation_name: :update_all
) do |sub_batch|
sub_batch.where(EXPIRES_ON_21_22_23_AT_MIDNIGHT_IN_TIMEZONE)
.or(sub_batch.where(file_type: 3))
.update_all(expire_at: nil)
end
    • @alberts-gitlab @ahegyi When we use apply_additional_filters, the same filters should be applied here as batching_scope, otherwise the batch generated by the job class will be different (may include more rows) from the batch generated by the batching strategy.

      Yes, it's confusing and error-prone, this is one of the reasons for switching to scope_to - the filters defined this way will be used by both the batching strategy and the job class, and we do not have to define custom batching strategy class.

      We're still fine here, as we apply the filters on the yielded relation (sub_batch). I'll update this as part of !96478 (merged) anyway,

Please register or sign in to reply
end
end
end
end
Loading