Skip to content

Create partitioned CI component usage table

What does this MR do and why?

This MR adds a migration to create p_catalog_resource_component_usages. This table is used to record unique projects that include a component in their pipeline(s) for each day.

This data will later enable users to sort components (or component projects) by popularity. We require a daily metric because we will need to query the number of unique projects for the last 30-days (rolling window).

Note: Our existing analytics instrumentation services don't readily support our requirements for both GitLab.com and self-managed instances. Thus we determined that a custom data table would be the simplest to implement. (Ref: !144932 (comment 1783721915))

Table/model notes:

  • Unique on component_id-used_by_project_id-used_date.
    • used_date is the format YYYY-mm-dd. We do not need the time parameters as we're only interested in the number of unique projects per day.
    • The uniqueness validation ensures that redundant data is not inserted into the table.
  • The table is partitioned for efficiency.
    • For now we will not truncate the partitions until after 1 year as we may need to keep historical data. We are planning to shorten this duration to 2-3 months as part of a follow up issue: #443681.
    • We'll likely be aggregating historical data by month, so it was determined that monthly would be the most appropriate partitioning strategy.
  • The sharding key is set to project_id instead of used_by_project_id because the "owner" of the data should be the project that owns the component rather than the one using it. (See !145881 (comment 1798736129).)
  • used_by_project_id does not have an FK constraint because we'll want to keep historical usage data on a component even if the associated used_by_project is deleted.
  • We will add further indexes later as needed to better support the queries for batch processing the data in #443381 (closed).

Resolves Part 1 of 2 of Task Create component usage data table and tracking ... (#443380 - closed). Parent issue: #440382 (closed).

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Query plan

Up

up.txt

Down

down.txt

Related to #440382 (closed)

Edited by Leaminn Ma

Merge request reports