Backend: Implement daily worker to aggregate last 30-day component usage data for each catalog resource

Summary

Our purpose for component usage instrumentation is two-fold:

  1. Monitor usage trends for components on GitLab.com. (Completed in #440382 (closed)).
  2. Allow both GitLab.com and self-managed users to filter/sort by component usage popularity in the UI.
    • "Popularity" is defined as: the number of unique projects that used the component in the last 30 days (rolling window).
    • We need to aggregate by catalog_resource_id and by component_id.

In #440382 (closed), we completed tasks #443380 (closed) and #443382 (closed), which enable us to track component usage both in our custom Postgres usage table (p_catalog_resource_component_usages) as well as via Internal Events. The latter implementation satisfies Objective 1.

Now in this issue we will complete Objective 2. With the logic to insert records into p_catalog_resource_component_usages complete (#443380 (closed)), the next step is to implement a daily worker to aggregate and process that data.

Proposal

Implement a daily worker to aggregate the data from p_catalog_resource_component_usages and save the last 30-day usage count and updated_at into:

  • catalog_resources.last_30_day_usage_count
  • catalog_resources.last_30_day_usage_count_updated_at

To ensure efficient data processing and queries, we should utilize batching and iterating techniques. We should also add database indexes as necessary. Follow the approach outlined in this discussion thread: #440382 (comment 1821995966).

MR Implementation

Step Description Link
1 Implement Usages::Aggregator and Usages::Aggregators::Cursor classes in preparation for creating the service and worker. Implement component usage batch aggregator (!151623 - merged)
2 Implement aggregator service and worker, set to run every 4 minutes. Add database indexes to optimize batching queries. Add worker to aggregate last 30-day catalog res... (!155001 - merged)

Implementation Table

Group Issue Link
backend Spike: Evaluate how to calculate number of time... (#438409 - closed)
backend Backend: Component usage instrumentation (#440382 - closed)
backend 👈 You are here
backend Backend: Add option to sort CI catalog resource... (#452620 - closed)
frontend Frontend: Show usage statistics and sort option... (#434333 - closed)
frontend Frontend: Present the number of times a single ... (#443632 - closed)
backend Backend: Reduce data retention period and store... (#443681 - closed)
backend Backend: Implement individual component sorting... (#453975 - closed)
Edited by Leaminn Ma