Backend: Denormalize catalog_resources.latest_version_id to improve query performance
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary
In order to improve the Ci::Catalog::Resources::Version.latest_for_catalog_resources query, we should add latest_version_id to catalog_resources in order to keep track of the latest version.
Context:
(Ref: !135669 (comment 1638483059))
One other idea I've seen for situations like this is keeping track of the "latest" somewhere (denormalized) so that you don't need to sort at all. I think we do this trick in at least 2 places today:
CREATE TABLE merge_requests ( ... latest_merge_request_diff_id integer, ... ); CREATE TABLE security_scans ( ... latest boolean DEFAULT true NOT NULL, ... );As you can see there are 2 ways you could do this. You could either add a column
catalog_resources.latest_version_idor you could addcatalog_resource_versions.latest boolean. In the case of thebooleanyou can have a partial index onlyWHERE latest = TRUE. There are probably tradeoffs with both approaches as they need to be kept up to date in different ways and off the top of my head it's not obvious which (if either) of these approaches is the best for your use case. But I just wanted to point it out as another option that saves from having to do theORDER BY ... LIMIT 1altogether.
(Ref: #429279 (comment 1635119218))
The N+1 problem you are describing is still valid but it's something we need to optimize for the index page. Maybe we can cache
catalog_resources.last_version_idany time a new version is created and we can preload version and release data efficiently? But that looks like a separate issue.
Proposal
- Add
latest_version_idtocatalog_resources. - Update the Version services to ensure
latest_version_idis updated accordingly when a Version is created or removed. - Update the query plans for
latestandlatest_for_catalog_resourcesinCi::Catalog::Resources::Versionto use the new column.