Optimize catalog component usage tracking (!215598) · Merge requests · GitLab.org / GitLab

What does this MR do and why?

This MR optimizes catalog component usage tracking to eliminate N+1 database queries that were causing performance issues when processing pipelines with many components.

Problem

The component usage tracking worker was executing N database queries for N components, performing individual lookups and inserts for each component in a pipeline. This created significant performance bottlenecks for pipelines using multiple catalog components.

Solution: Batch Processing Flow

This MR introduces a new optimized flow that processes components in batches:

1. Preload Component Projects (Single Query)

Load all component projects at once using Project.id_in(component_project_ids)
Preload catalog_resource associations to avoid additional queries
Index projects by ID for O(1) lookup

2. Separate Catalog vs Non-Catalog Components

Filter components upfront based on whether they have a catalog_resource
Track all components (catalog + non-catalog) for analytics
Process only catalog components for usage recording

3. Group Components by Project and SHA

Group catalog components by [project_id, sha] pairs
Enables batch lookups for components from the same project/version

4. Batch Component Lookups

New method: ComponentsProject#find_catalog_components(component_names)
Fetches multiple components in a single query per project/SHA combination
Replaces N individual find_catalog_component calls

5. Batch Usage Record Operations

Single fetch: Load all existing usage records in one query
Partition: Separate into records to update vs. records to insert
Bulk update: Use update_all for existing records
Bulk insert: Use insert_all for new records

Performance Impact

Before: N queries for N components (lookup + insert/update per component)

After: Fixed number of queries regardless of component count:

1 query to load component projects
1 query per unique project/SHA combination to load components
1 query to fetch existing usage records
1 query to bulk update existing records
1 query to bulk insert new records

Changes Summary

New batch lookup method: Ci::Catalog::ComponentsProject#find_catalog_components
Refactored service: Ci::Components::Usages::CreateService now accepts array of components
Optimized worker: Ci::Catalog::Resources::TrackComponentUsageWorker implements batch processing flow
N+1 prevention: Added query recorder tests to prevent regressions

Database Review

Query 1: Bulk Update Existing Usage Records

Location: app/services/ci/components/usages/create_service.rb:68

Query:

UPDATE catalog_resource_component_last_usages
SET last_used_date = '2025-12-09'
WHERE catalog_resource_component_last_usages.id IN (1, 2, 3)

Execution Plan:

Update on catalog_resource_component_last_usages  (cost=0.15..5.51 rows=0 width=0)
  ->  Index Scan using catalog_resource_component_last_usages_pkey on catalog_resource_component_last_usages  (cost=0.15..5.51 rows=3 width=10)
        Index Cond: (id = ANY ('{1,2,3}'::bigint[]))

Query 2: Bulk Insert New Usage Records

Location: app/services/ci/components/usages/create_service.rb:71

Query:

INSERT INTO catalog_resource_component_last_usages (component_id, used_by_project_id, catalog_resource_id, component_project_id, last_used_date)
VALUES
  (100, 200, 300, 400, '2025-12-09'),
  (101, 201, 301, 401, '2025-12-09'),
  (102, 202, 302, 402, '2025-12-09')

Execution Plan:

Insert on catalog_resource_component_last_usages  (cost=0.00..0.05 rows=0 width=0)
  ->  Values Scan on "*VALUES*"  (cost=0.00..0.05 rows=3 width=44)

Query 3: Fetch Existing Usage Records

Location: app/services/ci/components/usages/create_service.rb:33

Query:

SELECT "catalog_resource_component_last_usages".*
FROM "catalog_resource_component_last_usages"
WHERE "catalog_resource_component_last_usages"."component_id" IN (100, 101, 102)
  AND "catalog_resource_component_last_usages"."used_by_project_id" = 200

Issue: Optimize catalog component lookups in CI pipeli... (#581360)

Edited Dec 09, 2025 by Avielle Wolfe

Optimize catalog component usage tracking

What does this MR do and why?

Problem

Solution: Batch Processing Flow

1. Preload Component Projects (Single Query)

2. Separate Catalog vs Non-Catalog Components

3. Group Components by Project and SHA

4. Batch Component Lookups

5. Batch Usage Record Operations

Performance Impact

Changes Summary

Database Review

Query 1: Bulk Update Existing Usage Records

Query 2: Bulk Insert New Usage Records

Query 3: Fetch Existing Usage Records

Merge request reports