Skip to content

GitLab Next

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
GitLab FOSS
GitLab FOSS
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 1
    • Issues 1
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge Requests 0
    • Merge Requests 0
  • Requirements
    • Requirements
    • List
  • Security & Compliance
    • Security & Compliance
    • Dependency List
    • License Compliance
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
  • GitLab.org
  • GitLab FOSSGitLab FOSS
  • Issues
  • #52568

Closed
Open
Opened Oct 11, 2018 by Andrew Newdigate@andrewnDeveloper

Store merge request diffs in object storage as an alternative to PG

Currently merge_request_diff_commits and merge_request_diff_files are two of the largest tables on GitLab.com, weighing in at 288 GB and 740 GB respectively.

This leads to numerous problems, see

  • gitlab-com/gl-infra/infrastructure#4939 (closed)
  • gitlab-com/gl-infra/infrastructure#4853 (closed)
  • gitlab-com/gl-infra/infrastructure#4916
  • gitlab-com/gl-infra/infrastructure#4917
  • gitlab-com/gl-infra/infrastructure#4920

In a call between @Finotto @smcgivern and myself, we discussed some of the existing proposals, such as https://gitlab.com/gitlab-org/gitlab-ce/issues/37632. While these are good solutions, some of them are difficult to migrate towards, particularly on GitLab.com.

During the call, we discussed an alternative proposal of keeping the existing structure (for now) - ie full diffs, not deduplicated through blobs but conditionally migrating the diffs to object storage, possibly using a git-lfs like scheme in which the diff is replaced in the table with a pointer to a object storage location.

This approach would allow older diffs to be progressively moved over to object storage, allowing newer merge requests to continue to be stored in the database for performance reasons.

It would also make migrating gitlab.com's data much easier, possible via a long-running background migration.

This approach does not preclude future enhancements (such as deduplication) but is a smaller first step which would relieve some of the pain this table is currently causing.

cc @DouweM @nick.thomas @abrandl @oswaldo

Assignee
Assign to
11.8
Milestone
11.8 (Past due)
Assign milestone
Time tracking
None
Due date
None
Reference: gitlab-org/gitlab-foss#52568