Optimize catalog component usage tracking
What does this MR do and why?
This MR optimizes catalog component usage tracking to eliminate N+1 database queries that were causing performance issues when processing pipelines with many components.
Problem
The component usage tracking worker was executing N database queries for N components, performing individual lookups and inserts for each component in a pipeline. This created significant performance bottlenecks for pipelines using multiple catalog components.
Solution: Batch Processing Flow
This MR introduces a new optimized flow that processes components in batches:
1. Preload Component Projects (Single Query)
- Load all component projects at once using
Project.id_in(component_project_ids) - Preload
catalog_resourceassociations to avoid additional queries - Index projects by ID for O(1) lookup
2. Separate Catalog vs Non-Catalog Components
- Filter components upfront based on whether they have a
catalog_resource - Track all components (catalog + non-catalog) for analytics
- Process only catalog components for usage recording
3. Group Components by Project and SHA
- Group catalog components by
[project_id, sha]pairs - Enables batch lookups for components from the same project/version
4. Batch Component Lookups
- New method:
ComponentsProject#find_catalog_components(component_names) - Fetches multiple components in a single query per project/SHA combination
- Replaces N individual
find_catalog_componentcalls
5. Batch Usage Record Operations
- Single fetch: Load all existing usage records in one query
- Partition: Separate into records to update vs. records to insert
-
Bulk update: Use
update_allfor existing records -
Bulk insert: Use
insert_allfor new records
Performance Impact
Before: N queries for N components (lookup + insert/update per component)
After: Fixed number of queries regardless of component count:
- 1 query to load component projects
- 1 query per unique project/SHA combination to load components
- 1 query to fetch existing usage records
- 1 query to bulk update existing records
- 1 query to bulk insert new records
Changes Summary
-
New batch lookup method:
Ci::Catalog::ComponentsProject#find_catalog_components -
Refactored service:
Ci::Components::Usages::CreateServicenow accepts array of components -
Optimized worker:
Ci::Catalog::Resources::TrackComponentUsageWorkerimplements batch processing flow - N+1 prevention: Added query recorder tests to prevent regressions
Database Review
Query 1: Bulk Update Existing Usage Records
Location: app/services/ci/components/usages/create_service.rb:68
Query:
UPDATE catalog_resource_component_last_usages
SET last_used_date = '2025-12-09'
WHERE catalog_resource_component_last_usages.id IN (1, 2, 3)
Execution Plan:
Update on catalog_resource_component_last_usages (cost=0.15..5.51 rows=0 width=0)
-> Index Scan using catalog_resource_component_last_usages_pkey on catalog_resource_component_last_usages (cost=0.15..5.51 rows=3 width=10)
Index Cond: (id = ANY ('{1,2,3}'::bigint[]))
Query 2: Bulk Insert New Usage Records
Location: app/services/ci/components/usages/create_service.rb:71
Query:
INSERT INTO catalog_resource_component_last_usages (component_id, used_by_project_id, catalog_resource_id, component_project_id, last_used_date)
VALUES
(100, 200, 300, 400, '2025-12-09'),
(101, 201, 301, 401, '2025-12-09'),
(102, 202, 302, 402, '2025-12-09')
Execution Plan:
Insert on catalog_resource_component_last_usages (cost=0.00..0.05 rows=0 width=0)
-> Values Scan on "*VALUES*" (cost=0.00..0.05 rows=3 width=44)
Query 3: Fetch Existing Usage Records
Location: app/services/ci/components/usages/create_service.rb:33
Query:
SELECT "catalog_resource_component_last_usages".*
FROM "catalog_resource_component_last_usages"
WHERE "catalog_resource_component_last_usages"."component_id" IN (100, 101, 102)
AND "catalog_resource_component_last_usages"."used_by_project_id" = 200
Issue: Optimize catalog component lookups in CI pipeli... (#581360)