Clean up schema for table projects

Per https://gitlab.com/gitlab-com/infrastructure/issues/1709 (private to GitLab employees):

  • Wrong data type: mirror_last_update_at should be timestamptz
  • Wrong data type: mirror_last_successful_update_at should be timestamptz
  • Wrong data type: last_repository_check_at should be timestamptz
  • Missing FK: creator_id -> users (2209 entries violate this rule)
  • Missing FK: namespace_id -> namespaces (26 entries violate this rule)
  • Missing FK: mirror_user_id -> users (798 entries violate this rule)
  • Missing FK: ci_id -> ... something?
  • Frequently updated data co-mingled with relatively static data: last_activity_at implies often updated data
  • Duplicate data: path seems very similar to name+namespace info, plus has three indexes (btree, lower(path) btree and gin)
    • path is just the project path (e.g. gitlab-ce)
    • The regular btree index is used by MySQL, the lower(path) one by PostgreSQL, and the GIN one is used for searching on PostgreSQL. Hopefully we can remove the GIN one once ElasticSearch works
  • Data retention: "archived" implies data no longer needed, 25201 projects are "archived"
    • Remove column/use of it, projects should either contain still in use projects (that may be removed) or removed projects. No point in filtering out projects that are about to be removed
  • Duplicate data: import_status would be more efficient as an enum or integer (or smaller), save ~17MB
    • Use a Rails enum if possible, but state_machine probably won't like this
  • "import" implies one-time operation- should be an independent table
    • Also used for mirroring I believe
  • Extra default: Specifying "default NULL" isn't necessary- NULL is the default when there is no default
  • Index on boolean field: pending_delete
    • Nuke column and stop usage of it (https://gitlab.com/gitlab-org/gitlab-ce/issues/37447)
  • Index on boolean field: last_repository_check_failed
Edited Nov 01, 2017 by Yorick Peterse
Assignee Loading
Time tracking Loading