Backend: Implement daily worker to aggregate last 30-day component usage data for each catalog resource
Summary
Our purpose for component usage instrumentation is two-fold:
- Monitor usage trends for components on
GitLab.com
. (Completed in #440382 (closed)). - Allow both
GitLab.com
and self-managed users to filter/sort by component usage popularity in the UI.- "Popularity" is defined as: the number of unique projects that used the component in the last 30 days (rolling window).
- We need to aggregate by
catalog_resource_id
and bycomponent_id
.
In #440382 (closed), we completed tasks #443380 (closed) and #443382 (closed), which enable us to track component usage both in our custom Postgres usage table (p_catalog_resource_component_usages
) as well as via Internal Events. The latter implementation satisfies Objective 1.
Now in this issue we will complete Objective 2. With the logic to insert records into p_catalog_resource_component_usages
complete (#443380 (closed)), the next step is to implement a daily worker to aggregate and process that data.
Proposal
Implement a daily worker to aggregate the data from p_catalog_resource_component_usages
and save the last 30-day usage count and updated_at into:
catalog_resources.last_30_day_usage_count
catalog_resources.last_30_day_usage_count_updated_at
To ensure efficient data processing and queries, we should utilize batching and iterating techniques. We should also add database indexes as necessary. Follow the approach outlined in this discussion thread: #440382 (comment 1821995966).
MR Implementation
Step | Description | Link |
---|---|---|
1 | Implement Usages::Aggregator and Usages::Aggregators::Cursor classes in preparation for creating the service and worker. |
Implement component usage batch aggregator (!151623 - merged) |
2 | Implement aggregator service and worker, set to run every 4 minutes. Add database indexes to optimize batching queries. | Add worker to aggregate last 30-day catalog res... (!155001 - merged) |