Backend: Implement daily worker to aggregate last 30-day component usage data for each catalog resource

Summary

Our purpose for component usage instrumentation is two-fold:

Monitor usage trends for components on GitLab.com. (Completed in #440382 (closed)).
Allow both GitLab.com and self-managed users to filter/sort by component usage popularity in the UI.
- "Popularity" is defined as: the number of unique projects that used the component in the last 30 days (rolling window).
- We need to aggregate by catalog_resource_id and by component_id.

In #440382 (closed), we completed tasks #443380 (closed) and #443382 (closed), which enable us to track component usage both in our custom Postgres usage table (p_catalog_resource_component_usages) as well as via Internal Events. The latter implementation satisfies Objective 1.

Now in this issue we will complete Objective 2. With the logic to insert records into p_catalog_resource_component_usages complete (#443380 (closed)), the next step is to implement a daily worker to aggregate and process that data.

Proposal

Implement a daily worker to aggregate the data from p_catalog_resource_component_usages and save the last 30-day usage count and updated_at into:

catalog_resources.last_30_day_usage_count
catalog_resources.last_30_day_usage_count_updated_at

To ensure efficient data processing and queries, we should utilize batching and iterating techniques. We should also add database indexes as necessary. Follow the approach outlined in this discussion thread: #440382 (comment 1821995966).

MR Implementation

Step	Description	Link
1	Implement `Usages::Aggregator` and `Usages::Aggregators::Cursor` classes in preparation for creating the service and worker.	Implement component usage batch aggregator (!151623 - merged)
2	Implement aggregator service and worker, set to run every 4 minutes. Add database indexes to optimize batching queries.	Add worker to aggregate last 30-day catalog res... (!155001 - merged)

Implementation Table

Group	Issue Link
backend	Spike: Evaluate how to calculate number of time... (#438409 - closed)
backend	Backend: Component usage instrumentation (#440382 - closed)
backend	👈 You are here
backend	Backend: Add option to sort CI catalog resource... (#452620 - closed)
frontend	Frontend: Show usage statistics and sort option... (#434333)
frontend	Frontend: Present the number of times a single ... (#443632)
backend	Backend: Reduce data retention period and store... (#443681)
backend	Backend: Implement individual component sorting... (#453975)

Edited Jun 18, 2024 by Leaminn Ma