Skip to content

Cells: Classify: Make uploads table to be attributable to be an org

Problem

The uploads holds a record of all uploaded files into GitLab. This table is attached to many models (users, projects, groups, etc.).

This table is not clearly attributable to be either clusterwide or cell-local.

There was some investigation into the problem in [Feature] Cells 1.0 impact for file uploads (#443573 - closed)

Geo

The same applies to upload_states that is used by Geo to track uploaded records that needs verification.

Dependencies

We need the tables backing the models using uploads to have their sharding keys so that we can use them.

  • abuse_reports
  • achievements
  • ai_vectorizable_files
  • alert_management_alert_metric_images
  • appearances
  • bulk_import_export_uploads
  • dependency_list_export_parts
  • dependency_list_exports
  • design_management_designs_versions
  • import_export_uploads
  • issuable_metric_images
  • namespaces
  • organization_details
  • project_relation_export_uploads
  • topics
  • projects
  • snippets
  • user_permission_export_uploads
  • users
  • vulnerability_archive_exports
  • vulnerability_export_parts
  • vulnerability_exports
  • vulnerability_remediations

https://docs.google.com/spreadsheets/d/19CcPaUGxOaT1rwjSdRvLkhu_-91RUBOdjDFGVxOonVs/edit?usp=sharing

Solution

We should introduce new table to be either cluster or cell-local and split this table into two with a clear purpose.

Proposal

Based on the discussion here - #398199 (comment 2101029924).

  • Milestone 17.7:

  • Milestone 17.11:

    • Create new uploads_9ba88c4165 table (like uploads) partitioned by model_type, mark it as exempt_from_sharding: true (!175203 (merged))
    • Create partition for each model_type in the public schema (!175203 (merged))
    • For each partition create FK referencing the sharding key table (!175203 (merged))
    • Start syncing uploads -> uploads_9ba88c4165 (!175203 (merged))
  • Milestone 18.2 (required stop):

    • Backfill uploads_9ba88c4165 when every related model has its sharding key ready (!181349 (merged))
  • Milestone 18.3:

  • Milestone 18.5:

  • Milestone N (work on all dependencies is completed)

    • Add database triggers for all partitions to set sharding key if missing.
    • Truncate partitions (to remove orphaned uploads)
    • Re-run back-fill (updated to set new sharding keys)
    Tables Prepare for sharding
    achievements !207893
  • Milestone M (after a required stop)

    • Finalize back-fill
    • For each partition create NOT NULL constraint !199513 (closed)
    • Define sharding key for each partition
    • Switch the app to use the new partitioned table by swapping the table names
Edited by Tomasz Skorupa