Redesign LFS table structures

For LFS we currently have two tables:

  • lfs_objects
  • lfs_objects_projects

These tables have the following structures:

root@gitlab (localhost)=# \d lfs_objects_projects
                                       Table "public.lfs_objects_projects"
    Column     |            Type             |                             Modifiers                             
---------------+-----------------------------+-------------------------------------------------------------------
 id            | integer                     | not null default nextval('lfs_objects_projects_id_seq'::regclass)
 lfs_object_id | integer                     | not null
 project_id    | integer                     | not null
 created_at    | timestamp without time zone | 
 updated_at    | timestamp without time zone | 
Indexes:
    "lfs_objects_projects_pkey" PRIMARY KEY, btree (id)
    "index_lfs_objects_projects_on_project_id" btree (project_id)
Foreign-key constraints:
    "fk_rails_2eb33f7a78" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE

root@gitlab (localhost)=# \d lfs_objects
                                     Table "public.lfs_objects"
   Column   |            Type             |                        Modifiers                         
------------+-----------------------------+----------------------------------------------------------
 id         | integer                     | not null default nextval('lfs_objects_id_seq'::regclass)
 oid        | character varying(255)      | not null
 size       | bigint                      | not null
 created_at | timestamp without time zone | 
 updated_at | timestamp without time zone | 
 file       | character varying(255)      | 
Indexes:
    "lfs_objects_pkey" PRIMARY KEY, btree (id)
    "index_lfs_objects_on_oid" UNIQUE, btree (oid)

What's problematic here is that lfs_objects has no knowledge of the projects that the data may belong to. That is, these rows don't refer to a project, instead projects refer to them. This prevents us from setting up foreign keys that remove LFS objects (at least the database rows) whenever a project is removed. Further, in this setup it's possible for a LFS object to be associated with multiple projects, which seems a bit silly.

Instead we should use a single table called lfs_objects with the following structure:

  • id
  • project_id
  • oid
  • size
  • file
  • created_at
  • updated_at

Here project_id should have a foreign key referring to projects.id with a cascading delete. This allows us to easily remove the data, and also makes it easier to query the data since we now only have 1 table (removing the need for any joins).

Index wise the pair (project_id, oid) should have a UNIQUE index. The created_at and updated_at columns should be timestamp with time zone, and oid should be of type bytea.

Assignee Loading
Time tracking Loading