Save repository_type to LfsObjectsProject
What does this MR do?
Overview
This MR adds a new property repository_type to the table lfs_objects_projects to record the repository type (one of 'project', 'wiki' or 'design') that the corresponding blob of the LFS object lives in.
As we are enabling the ability for design and wiki repositories to store files in LFS (https://gitlab.com/gitlab-org/gitlab-ee/issues/9490 and https://gitlab.com/gitlab-org/gitlab-ce/issues/43721), recording the repository_type will make it easier to prune unreferenced LFS objects in the future.
Note that this property is currently allows null values. This discussion thread decided to introduce this new column, ensure that all LFS files for the new 'design' repository are correctly saving with the right repository_type, and allow there to be null values (past and future), knowing that these records will all be for the regular 'project' repository. This technical debt will be addressed starting in https://gitlab.com/gitlab-org/gitlab-ce/issues/43721 where all parts of the app that create LfsObjects will be consolidated to save a repository_type value. There will also need to be an additional data migration and not null constraint added to the column after that piece of work.
More detail
https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/13389 adds the ability for images in a project's design repository (project.design_repository) to be stored in LFS.
This discussion thread raised the issue of how in the future we want to prune unreferenced LFS objects, and as LFS pointer files can now be in multiple repositories, to do so efficiently we should record which repository the blob lives in.
As an LFS object's oid is determined by the data in the blob file, it's
possible now for a project to have the same LfsObject for blobs in three
separate repositories ('project', 'wiki' or 'design'). This means up to three LfsObjectsProject records with the same project and lfs_object but different repository_types. As this model is the many-to-many join table the has_many relationships between lfs_objects and projects now need a distinct scope to limit the records returned to
non-duplicates.
Lfs::FileTransformer can now create LfsObjectsProject with
repository_type saved. This transformer is used by DesignManagement::SaveDesignsService, so LFS for blobs in the design repo will always have a repository_type saved.
Migration output
== 20190602014139 AddRepositoryTypeToLfsObjectsProject: migrating =============
-- add_column(:lfs_objects_projects, :repository_type, :integer, {:limit=>1, :null=>true})
-> 0.0026s
== 20190602014139 AddRepositoryTypeToLfsObjectsProject: migrated (0.0027s) ====
EXPLAIN output
See this Slack thread where I've asked for input into the performance of distinct in these queries.
These two EXPLAIN queries compare master branch with 9490-record-repository_type-on-lfs_objects_projects branch.
EXPLAIN for project.lfs_objects query in master:
=> EXPLAIN for: SELECT "lfs_objects".* FROM "lfs_objects" INNER JOIN "lfs_objects_projects" ON "lfs_objects"."id" = "lfs_objects_projects"."lfs_object_id" WHERE "lfs_objects_projects"."project_id" = $1 [["project_id", 33]]
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.30..16.52 rows=7 width=96)
-> Index Scan using index_lfs_objects_projects_on_project_id on lfs_objects_projects (cost=0.15..7.28 rows=7 width=4)
Index Cond: (project_id = 33)
-> Index Scan using lfs_objects_pkey on lfs_objects (cost=0.15..1.31 rows=1 width=96)
Index Cond: (id = lfs_objects_projects.lfs_object_id)
(5 rows)
EXPLAIN for project.lfs_objects query in this MR source branch 9490-record-repository_type-on-lfs_objects_projects:
=> EXPLAIN for: SELECT DISTINCT "lfs_objects".* FROM "lfs_objects" INNER JOIN "lfs_objects_projects" ON "lfs_objects"."id" = "lfs_objects_projects"."lfs_object_id" WHERE "lfs_objects_projects"."project_id" = $1 [["project_id", 28]]
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=2.24..2.28 rows=2 width=158)
-> Sort (cost=2.24..2.24 rows=2 width=158)
Sort Key: lfs_objects.id, lfs_objects.oid, lfs_objects.size, lfs_objects.created_at, lfs_objects.updated_at, lfs_objects.file, lfs_objects.file_store
-> Hash Join (cost=1.11..2.23 rows=2 width=158)
Hash Cond: (lfs_objects.id = lfs_objects_projects.lfs_object_id)
-> Seq Scan on lfs_objects (cost=0.00..1.07 rows=7 width=158)
-> Hash (cost=1.09..1.09 rows=2 width=4)
-> Seq Scan on lfs_objects_projects (cost=0.00..1.09 rows=2 width=4)
Filter: (project_id = 28)
(9 rows)
Database checklist
-
Conforms to the database guides
When adding migrations:
-
Updated db/schema.rb -
Added a downmethod so the migration can be reverted(uses a change method that is rollback-able) -
Added the output of the migration(s) to the MR body - [-] Added tests for the migration in
spec/migrationsif necessary (e.g. when migrating data)
When adding or modifying queries to improve performance:
- [-] Included data that shows the performance improvement, preferably in the form of a benchmark
- [-] Included the output of
EXPLAIN (ANALYZE, BUFFERS)of the relevant queries
When adding foreign keys to existing tables:
- [-] Included a migration to remove orphaned rows in the source table before adding the foreign key
- [-] Removed any instances of
dependent: ...that may no longer be necessary
When adding tables:
- [-] Ordered columns based on the Ordering Table Columns guidelines
- [-] Added foreign keys to any columns pointing to data in other tables
- [-] Added indexes for fields that are used in statements such as WHERE, ORDER BY, GROUP BY, and JOINs
When removing columns, tables, indexes or other structures:
- [-] Removed these in a post-deployment migration
- [-] Made sure the application no longer uses (or ignores) these structures
General checklist
- [-] Changelog entry added, if necessary
- [-] Documentation created/updated
-
Tests added for this feature/bug -
Conforms to the code review guidelines -
Conforms to the merge request performance guidelines -
Conforms to the style guides