Skip to content
Snippets Groups Projects

Draft: JiHu guidelines for database migrations (separate schema approach)

Closed Andreas Brandl requested to merge ab/jihu-db-separate into master
6 unresolved threads

Direct reading link: https://ab-jihu-db-separate.about.gitlab-review.app/handbook/ceo/chief-of-staff-team/jihu-support/jihu-database-change-process.html

Why is this change being made?

In order to facilitate upgrades between GitLab CE/EE editions and JiHu editions, we need to find a good strategy for database schema changes. There are two different proposals:

  1. Unified schema approach (!90048 (merged)): Upstream GitLab gets all JiHu database migrations, all editions have the same schema
  2. Separate schema approach (this proposal): Keep JiHu database migrations specific to JiHu, provide adjustments for edition changes from GitLab to JiHu

This describes the "keep the database schema the same across GitLab CE/EE and JiHu editions" approach.

Related issues:

Summary

This proposal implements a clear separation of concerns between GitLab and JiHu database migrations. In order to support edition changes from GitLab to JiHu editions, we may need to resolve conflicts in database migrations for each release of JiHu.

This reduces operational overhead on GitLab.com, too, as we would not see any JiHu-specific database objects (e.g. indexes) on GitLab.com (where they are absolutely not required).

Author Checklist

  • Provided a concise title for the MR
  • Added a description to this MR explaining the reasons for the proposed change, per say-why-not-just-what
    • Copy/paste the Slack conversation to document it for later, or upload screenshots. Verify that no confidential data is added.
  • Assign reviewers for this change to the correct DRI(s)
    • If the DRI for the page/s being updated isn’t immediately clear, then assign it to one of the people listed in the "Maintained by" section in on the page being edited.
    • If your manager does not have merge rights, please ask someone to merge it AFTER it has been approved by your manager in #mr-buddies.
  • If the changes affect team members, or warrant an announcement in another way, please consider posting an update in #whats-happening-at-gitlab linking to this MR.
    • If this is a change that directly impacts the majority of global team members, it should be a candidate for #company-fyi. Please work with internal communications and check the handbook for examples.

Edited by Andreas Brandl

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
9 - TOC
10 {:toc .hidden-md .hidden-lg}
11
12 ## JiHu guidelines for database changes
13
14 We distinguish the following types of JiHu contributions containing database migrations (for PostgreSQL):
15
16 1. Regular contribution: **Upstream** the code change including database migrations upstream into GitLab
17 1. Proprietary change: For a proprietary, JiHu-specific code change, do **not upstream** anything into GitLab
18
19 The following details guidelines and background for **proprietary contributions**. We choose not to upstream neither code changes nor database migrations into GitLab.
20 Instead, we intend to keep JiHu-specific database changes separate and focus on providing database migrations for specific upgrade paths between GitLab and JiHu editions.
21
22 ### Upgrade paths
23
24 Assumption: We need to support seamless upgrades between any GitLab CE/EE release and the corresponding JiHu release. For illustration, let's look at the following artificial releases:
  • Author Developer

    For emphasis, this is an assumption I'm making as we discuss relevant upgrade paths in gitlab-jh/gitlab#213 (comment 671944573).

    From the discussion, I understand that GL -> JH is a necessity.

    I'm starting out with the assumption here so we can also understand the additional cost for supporting the opposite direction, too (if needed).

    If the opposite direction is also necessary to support, it becomes a lot more complex:

    1. Would we "undo" JH specific migrations?
    2. We would likely have to accept to lose JH-specific data in this case

    This is also relevant for the alternative approach, see !90048 (comment 678509410) for an example.

    Edited by Andreas Brandl
  • My thinking is that, if we're ok to lose JH-specific data, I feel it'll be easier to move from JH -> GL, because at worst we can just leave some garbage data? Provided that there's no conflicting interpretation regarding to the same data.

    However I can see in order to remove those data, it might be difficult. Consider a case there's a new jh_phones table, and there's jh_phone_id column in users table. In order to drop the jh_phones table, we have to drop the column as well.

    Not sure if it's possible that we can take a schema, and then drop all the extra columns/tables/indices/triggers based on it?

  • Please register or sign in to reply
  • Andreas Brandl added 1 commit

    added 1 commit

    • acee2e4a - JiHu guidelines for db migrations

    Compare with previous version

    • Author Developer

      @godfat-gitlab @pbair @craig-gomes Bear with me please - captured the discussion in gitlab-jh/gitlab#161 (closed) into this writeup for comparison with !90048 (merged).

      Is something missing?

      I find it hard to check we understand all types of conflicts that may arise, but given we can always move to the approach in !90048 (merged) (much easier than the opposite direction), I'd be comfortable trying this until we see more conflicts.

    • Generally I don't think it's missing anything, but I see we should be able to add a lot of safeguards into the process to ease the potential conflicts.

      I agree it's actually quite difficult to foresee what conflicts we would be facing with. It's highly depending on how the features will be interacting with the data. That said, I think having the ability to have a split schema, is adding flexibility on top of unified schema.

    • Please register or sign in to reply
  • Andreas Brandl mentioned in merge request !90048 (merged)

    mentioned in merge request !90048 (merged)

  • Andreas Brandl changed the description

    changed the description

  • 2 layout: handbook-page-toc
    3 title: JiHu guidelines for database changes
    4 ---
    5
    6 ## On this page
    7 {:.no_toc .hidden-md .hidden-lg}
    8
    9 - TOC
    10 {:toc .hidden-md .hidden-lg}
    11
    12 ## JiHu guidelines for database changes
    13
    14 We distinguish the following types of JiHu contributions containing database migrations (for PostgreSQL):
    15
    16 1. Regular contribution: **Upstream** the code change including database migrations upstream into GitLab
    17 1. Proprietary change: For a proprietary, JiHu-specific code change, do **not upstream** anything into GitLab
    • I think we can consider to relax this so it's not strictly stopping everything from moving to upstream. I see it might be difficult to formalize what can and what not though. Maybe a technical decision on this can be considered. If the impact/damage is very low, and it's difficult to split them, then we can consider.

    • Please register or sign in to reply
  • 17 1. Proprietary change: For a proprietary, JiHu-specific code change, do **not upstream** anything into GitLab
    18
    19 The following details guidelines and background for **proprietary contributions**. We choose not to upstream neither code changes nor database migrations into GitLab.
    20 Instead, we intend to keep JiHu-specific database changes separate and focus on providing database migrations for specific upgrade paths between GitLab and JiHu editions.
    21
    22 ### Upgrade paths
    23
    24 Assumption: We need to support seamless upgrades between any GitLab CE/EE release and the corresponding JiHu release. For illustration, let's look at the following artificial releases:
    25
    26 | GitLab CE/EE | JiHu |
    27 |---|---|
    28 | GL 14.0 | JH 1.0 |
    29 | GL 14.1 | JH 1.1 |
    30 | GL 14.2 | JH 1.2 |
    31 | GL 14.3 | JH 1.3 |
    32 | GL 15.0 | JH 2.0 |
    • I see this is artificial, but I believe JiHu is releasing with GitLab so the version number will be the same. This has both practical value and business value, that it will be pretty clear what is included in what version and edition.

      Is this fixed included in GitLab? If I know, I'll know if it's included in the corresponding JH edition without digging too much into it. The same can be applied from the other way as long as it's not JH specific.

    • Please register or sign in to reply
  • 74 In order to perform the edition change to JH 1.3, we need to reconcile conflicting schema changes: For JH 1.3, we want to add a `phone` column to the `users` table (which does not exist anymore).
    75
    76 In order to resolve this, we implement the methodology below to reconcile the database schema to what is expected in the corresponding JH version.
    77
    78 ### Reconciling database schema for edition changes
    79
    80 For an edition change to JH, we need to execute pending JH-specific database migrations. This always goes back to the beginning of the JH timeline, since no
    81 GL CE/EE installation will know of any JH specific migration until we perform an edition change. In our example, this is `03.rb` and `06.rb`.
    82
    83 `03.rb` had been introduced in `JH 1.2` and now needs to be adapted to support the edition change from `GL CE/EE 1.3` to `JH 1.3`. In this particular case,
    84 we would change the existing JH migration `03.rb` to "Add `phone` to `accounts`".
    85
    86 For every release of JH, we are going to review the necessary adjustments needed to perform an edition change based on this release. Those adjustments are going to be applied to
    87 already existing JH specific migrations as necessary and before the release of JH. Many type of conflicts can be detected through automation.
    88
    89 Another source of conflicts is name clashes, e.g. when GitLab adds a table, column or index (or other database object) with the same name as a JH migration. In that case, the adjustments necessary
  • 79
    80 For an edition change to JH, we need to execute pending JH-specific database migrations. This always goes back to the beginning of the JH timeline, since no
    81 GL CE/EE installation will know of any JH specific migration until we perform an edition change. In our example, this is `03.rb` and `06.rb`.
    82
    83 `03.rb` had been introduced in `JH 1.2` and now needs to be adapted to support the edition change from `GL CE/EE 1.3` to `JH 1.3`. In this particular case,
    84 we would change the existing JH migration `03.rb` to "Add `phone` to `accounts`".
    85
    86 For every release of JH, we are going to review the necessary adjustments needed to perform an edition change based on this release. Those adjustments are going to be applied to
    87 already existing JH specific migrations as necessary and before the release of JH. Many type of conflicts can be detected through automation.
    88
    89 Another source of conflicts is name clashes, e.g. when GitLab adds a table, column or index (or other database object) with the same name as a JH migration. In that case, the adjustments necessary
    90 on the JH side may also spill into code and models (for example when having to adjust column names).
    91
    92 ### Reducing conflict potential
    93
    94 Adjustments to existing JH migrations to support edition changes are only needed for migrations conflicting with upstream changes. We can use strategies to avoid those conflicts in the first place to
    • I think we can also mention that if we have to introduce an incompatibility (in the classic example, we know that if we rename users it'll cause an issue in JH), we'll notify JH and help them figure a solution in this case.

    • Please register or sign in to reply
    • Contributor
      Resolved by Mek Stittri

      @abrandl @godfat-gitlab are you comfortable with going the separate schema in the visible term? Reasons:

      • Ensure reduced complexity especially in this critical time with increased focus on reliability and supporting sharing work
      • Mitigations
        • If we are to make JH compatible with EE, we just need to drop columns/tables that are not being used by EE code
        • Going with split schema can be unified easily

      I want to summarize the bottomline communication and trade-offs. If yes, I will work to make sure our stakeholders are aware and we can move forward. Thanks!

      cc @jeromezng @craig-gomes

  • Author Developer

    Closing as per the decision made (!90336 (comment 683663451), !90048 (comment 682481792)).

  • Please register or sign in to reply
    Loading