Update AI Catalog database schema to use project_id as sharding key in addition to organization_id
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary
Currently, the AI Catalog uses organization_id as the sharding key for ai_catalog_items and related tables. However, this may not be optimal for performance and scalability reasons, especially as the number of catalog items grows.
There are cases where we may not have a valid project_id (e.g. if we delete the project but need to keep the items around for existing consumers). In these cases it's fine to continue using the organization_id as the sharding key.
Problem
According to the using organization_id as sharding key docs, organization_id doesn't seem like a great candidate for sharding when there are going to be many rows per organization.
This becomes a bigger problem when considering current and future tables related to the ai_catalog_items table:
ai_catalog_item_versionsai_catalog_item_versions_dependencies
These will quickly add up to a lot of records with organization_id as the sharding key.
Proposal
Since AI catalog items are now always associated with projects (based on the repository-managed AI items approach), we should:
- Add a
project_idto all AI catalog related tables - Use
project_idas the sharding key instead oforganization_idwhen a project exists - Update scoping logic to query items by organization through their associated projects:
Ai::Catalog::Item.includes(:project).where(projects: { organization_id: current_organization.id }) - Add a multi-not-null database constraint to prevent both
organization_idandproject_idfrom being null (or both set) for all affected tables.
Tables to Update
ai_catalog_itemsai_catalog_item_versionsai_catalog_item_versions_dependencies- Any future AI catalog related tables
Implementation Notes
- This change should ensure we don't use
organization_idas the sharding key for any new AI catalog related tables - The scoping in
Ai::Catalog::ItemsFinderwill need to be updated to work with the new schema - Consider migration strategy for existing data
Related
- Epic: &18657
- Discussion: &18657 (comment 2745095670)
- Repository-managed AI items spike: !203607 (closed)