Skip to content

Change catalog resource last_30_day_usage_count_updated_at default

What does this MR do and why?

Background context

In !155001 (merged), we introduced a new aggregator worker that collects the last 30-day usage count for each row in catalog_resources daily. It's working as expected, however we observed that it reprocesses the data more often than necessary due to new catalog resources being added throughout the day (ref: #452545 (comment 1951203360)).

Why it happens

The aggregator is executed on each job run unless it meets the stop condition, done_processing?, which checks if all catalog resources have been "processed" for today:

          min_updated_at = TARGET_MODEL.minimum(:last_30_day_usage_count_updated_at)
          return true unless min_updated_at

          min_updated_at >= today.to_time

When the stop condition is satisfied for today, we want to skip executing the aggregator until tomorrow. However, currently when a new catalog resource is added, its default last_30_day_usage_count_updated_at value is set to 1970-01-01, which causes done_processing? to return false and the aggregator is executed again. The latter is redundant because we only aggregate usage data from yesterday or older, so the usage_count of a new catalog resource is always 0; i.e. there's no need to reprocess the data.

This MR

In this MR, we change the default value of last_30_day_usage_count_updated_at from 1970-01-01 to NOW(). With this, done_processing? will be unaffected by newly added catalog resources, which allows us to avoid the redundant processing as described above.

Database Notes

  • When the column (last_30_day_usage_count_updated_at) was first introduced, its default was set to 1970-01-01 as a way to identify rows that weren't yet updated for the first time. We no longer require this, so it's okay to change.
  • A multi-release approach is not necessary since we don't explicitly write the old default value to last_30_day_usage_count_updated_at anywhere.
  • Whether the old or new default value is set, it has no negative impact; this change is considered safe.

Resolves #467555 (closed)

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Migrations

Up

main: == [advisory_lock_connection] object_id: 127940, pg_backend_pid: 73940
main: == 20240617210449 ChangeCatalogResourcesLast30DayUsageCountUpdatedAtDefault: migrating 
main: -- change_column_default(:catalog_resources, :last_30_day_usage_count_updated_at, #<Proc:0x000000012f8b1908 /Users/leaminn/gitlab/gitlab-development-kit/gitlab/db/migrate/20240617210449_change_catalog_resources_last30_day_usage_count_updated_at_default.rb:7 (lambda)>)
main:    -> 0.0156s
main: == 20240617210449 ChangeCatalogResourcesLast30DayUsageCountUpdatedAtDefault: migrated (0.0185s) 

main: == [advisory_lock_connection] object_id: 127940, pg_backend_pid: 73940
ci: == [advisory_lock_connection] object_id: 128300, pg_backend_pid: 73944
ci: == 20240617210449 ChangeCatalogResourcesLast30DayUsageCountUpdatedAtDefault: migrating 
ci: -- change_column_default(:catalog_resources, :last_30_day_usage_count_updated_at, #<Proc:0x0000000173f71df8 /Users/leaminn/gitlab/gitlab-development-kit/gitlab/db/migrate/20240617210449_change_catalog_resources_last30_day_usage_count_updated_at_default.rb:7 (lambda)>)
ci:    -> 0.0026s
ci: == 20240617210449 ChangeCatalogResourcesLast30DayUsageCountUpdatedAtDefault: migrated (0.0097s) 

ci: == [advisory_lock_connection] object_id: 128300, pg_backend_pid: 73944

Down

main: == [advisory_lock_connection] object_id: 127560, pg_backend_pid: 74684
main: == 20240617210449 ChangeCatalogResourcesLast30DayUsageCountUpdatedAtDefault: reverting 
main: -- change_column_default(:catalog_resources, :last_30_day_usage_count_updated_at, "1970-01-01")
main:    -> 0.0161s
main: == 20240617210449 ChangeCatalogResourcesLast30DayUsageCountUpdatedAtDefault: reverted (0.0190s) 

main: == [advisory_lock_connection] object_id: 127560, pg_backend_pid: 74684
ci: == [advisory_lock_connection] object_id: 127560, pg_backend_pid: 75091
ci: == 20240617210449 ChangeCatalogResourcesLast30DayUsageCountUpdatedAtDefault: reverting 
ci: -- change_column_default(:catalog_resources, :last_30_day_usage_count_updated_at, "1970-01-01")
ci:    -> 0.0156s
ci: == 20240617210449 ChangeCatalogResourcesLast30DayUsageCountUpdatedAtDefault: reverted (0.0225s) 

ci: == [advisory_lock_connection] object_id: 127560, pg_backend_pid: 75091

Related to #467555 (closed)

Edited by Leaminn Ma

Merge request reports