Update AI Catalog database schema to use project_id as sharding key in addition to organization_id

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Summary

Currently, the AI Catalog uses organization_id as the sharding key for ai_catalog_items and related tables. However, this may not be optimal for performance and scalability reasons, especially as the number of catalog items grows.

There are cases where we may not have a valid project_id (e.g. if we delete the project but need to keep the items around for existing consumers). In these cases it's fine to continue using the organization_id as the sharding key.

Problem

According to the using organization_id as sharding key docs, organization_id doesn't seem like a great candidate for sharding when there are going to be many rows per organization.

This becomes a bigger problem when considering current and future tables related to the ai_catalog_items table:

  • ai_catalog_item_versions
  • ai_catalog_item_versions_dependencies

These will quickly add up to a lot of records with organization_id as the sharding key.

Proposal

Since AI catalog items are now always associated with projects (based on the repository-managed AI items approach), we should:

  1. Add a project_id to all AI catalog related tables
  2. Use project_id as the sharding key instead of organization_id when a project exists
  3. Update scoping logic to query items by organization through their associated projects:
    Ai::Catalog::Item.includes(:project).where(projects: { organization_id: current_organization.id })
  4. Add a multi-not-null database constraint to prevent both organization_id and project_id from being null (or both set) for all affected tables.

Tables to Update

  • ai_catalog_items
  • ai_catalog_item_versions
  • ai_catalog_item_versions_dependencies
  • Any future AI catalog related tables

Implementation Notes

  • This change should ensure we don't use organization_id as the sharding key for any new AI catalog related tables
  • The scoping in Ai::Catalog::ItemsFinder will need to be updated to work with the new schema
  • Consider migration strategy for existing data

Related

Edited by 🤖 GitLab Bot 🤖