Redesign LFS table structures
For LFS we currently have two tables:
lfs_objectslfs_objects_projects
These tables have the following structures:
root@gitlab (localhost)=# \d lfs_objects_projects
Table "public.lfs_objects_projects"
Column | Type | Modifiers
---------------+-----------------------------+-------------------------------------------------------------------
id | integer | not null default nextval('lfs_objects_projects_id_seq'::regclass)
lfs_object_id | integer | not null
project_id | integer | not null
created_at | timestamp without time zone |
updated_at | timestamp without time zone |
Indexes:
"lfs_objects_projects_pkey" PRIMARY KEY, btree (id)
"index_lfs_objects_projects_on_project_id" btree (project_id)
Foreign-key constraints:
"fk_rails_2eb33f7a78" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE
root@gitlab (localhost)=# \d lfs_objects
Table "public.lfs_objects"
Column | Type | Modifiers
------------+-----------------------------+----------------------------------------------------------
id | integer | not null default nextval('lfs_objects_id_seq'::regclass)
oid | character varying(255) | not null
size | bigint | not null
created_at | timestamp without time zone |
updated_at | timestamp without time zone |
file | character varying(255) |
Indexes:
"lfs_objects_pkey" PRIMARY KEY, btree (id)
"index_lfs_objects_on_oid" UNIQUE, btree (oid)
What's problematic here is that lfs_objects has no knowledge of the projects that the data may belong to. That is, these rows don't refer to a project, instead projects refer to them. This prevents us from setting up foreign keys that remove LFS objects (at least the database rows) whenever a project is removed. Further, in this setup it's possible for a LFS object to be associated with multiple projects, which seems a bit silly.
Instead we should use a single table called lfs_objects with the following structure:
- id
- project_id
- oid
- size
- file
- created_at
- updated_at
Here project_id should have a foreign key referring to projects.id with a cascading delete. This allows us to easily remove the data, and also makes it easier to query the data since we now only have 1 table (removing the need for any joins).
Index wise the pair (project_id, oid) should have a UNIQUE index. The created_at and updated_at columns should be timestamp with time zone, and oid should be of type bytea.