Skip to content
Snippets Groups Projects

WIP: Migrate Namespace#root_id column

Closed Mayra Cabrera requested to merge 62214-namespace-database-migration into master
5 unresolved threads

What does this MR do?

Prework of https://gitlab.com/gitlab-org/gitlab-ce/issues/62214

Does this MR meet the acceptance criteria?

Conformity

Performance and testing

Backend to do

  • 1. Add a new column on Namespaces: root_id
    • This new column should be NOT NULL, but we can't enforce that limit until we have migrated the data.
    • This new column will store the root namespace equal to the namespace ID, so the child namespaces will use the ID of the root namespace.
  • 2. Create a background migration to fill in Namespaces root_id
    • This migration would iterate over root namespaces only (parent_id = NULL), then run the usual CTE code to find all child nodes, then update all those.
  • 3. Include logic to fill root_id when a namespace is created/updated

Next iteration

  • Once the background migration has been completed, add another migration to include an index on root_id and to make it not null
Edited by Mayra Cabrera

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
9 DELAY_INTERVAL = 10.minutes.to_i
10
11 disable_ddl_transaction!
12
13 class Namespace < ActiveRecord::Base
14 self.table_name = 'namespaces'
15
16 include EachBatch
17 end
18
19 def up
20 say 'Scheduling `PopulateNamespaceRootIdColumn` jobs'
21
22 # We currently have ~4_600_000 namespace records on GitLab.com
23 # This means it'll schedule ~460 jobs (10k each) within a 10 minutes gap.
24 # so this should take ~153 hours to complete (assuming 30k namespaces per hour)
  • Mayra Cabrera changed the description

    changed the description

  • Mayra Cabrera marked the checklist item 1. Add a new column on Namespaces: root_id as completed

    marked the checklist item 1. Add a new column on Namespaces: root_id as completed

  • @nolith could you please take a look at the backend to do of this MR to see if it makes sense?

  • 2 Warnings
    :warning: CHANGELOG missing.

    You can create one with:

    bin/changelog -m 29772 "Migrate Namespace#root_id column"

    If your merge request doesn’t warrant a CHANGELOG entry,
    consider adding any of the ~backstage, ci-build, ~Documentation, meta, QA, test labels.
    See the documentation.

    :warning: This merge request is missing the database label.
    1 Message
    :book: This merge request adds or changes files that require a review from the Database team.

    Database Review

    The following files require a review from the Database team:

    • db/migrate/20190617154954_add_root_id_to_namespace.rb
    • db/migrate/20190617181054_schedule_populate_namespace_root_id.rb
    • db/schema.rb
    • lib/gitlab/background_migration/populate_namespace_root_id_column.rb

    To make sure these changes are reviewed, take the following steps:

    1. Edit your merge request, and add gl-database to the list of Group approvers.
    2. Mention @gl-database in a separate comment, and explain what needs to be reviewed by the team. Please don't mention the team until your changes are ready for review.

    Reviewer roulette

    Changes that require review have been detected! A merge request is normally reviewed by both a reviewer and a maintainer in its primary category (e.g. frontend or backend), and by a maintainer in all other categories.

    To spread load more evenly across eligible reviewers, Danger has randomly picked a candidate for each review slot. Feel free to override this selection if you think someone else would be better-suited, or the chosen person is unavailable.

    Once you've decided who will review this merge request, mention them as you normally would! Danger does not (yet?) automatically notify them for you.

    Category Reviewer Maintainer
    database Toon Claes (@toon) Andreas Brandl (@abrandl)
    backend Reuben Pereira (@rpereira2) Jen-Shin Lin (@godfat)

    Generated by :no_entry_sign: Danger

    Edited by 🤖 GitLab Bot 🤖
  • Mayra Cabrera added 1 commit

    added 1 commit

    • 2c36802e - Migrate root_id column on Namespaces table

    Compare with previous version

  • Mayra Cabrera added 272 commits

    added 272 commits

    Compare with previous version

  • Mayra Cabrera added 1 commit

    added 1 commit

    • a338b50f - Migrate root_id column on Namespaces table

    Compare with previous version

  • 46 46 ldap_group_links: %w[group_id],
    47 47 members: %w[source_id created_by_id],
    48 48 merge_requests: %w[last_edited_by_id state_id],
    49 namespaces: %w[owner_id parent_id],
    49 namespaces: %w[owner_id parent_id root_id],
  • 1 # frozen_string_literal: true
    2
    3 class AddRootIdToNamespace < ActiveRecord::Migration[5.1]
    4 include Gitlab::Database::MigrationHelpers
    5
    6 DOWNTIME = false
    7
    8 def change
    9 add_column :namespaces, :root_id, :integer
  • 47 namespaces.each do |child_namespace_id, root_namespace_id|
    48 statement = "WHEN #{child_namespace_id} THEN #{root_namespace_id}"
    49 namespaces_information << statement
    50 end
    51 end
    52 end
    53
    54 def update_sql_query(case_statements:, namespace_ids:)
    55 <<~SQL
    56 UPDATE namespaces
    57 SET root_id = CASE id
    58 #{case_statements.join("\n")}
    59 END
    60 WHERE id IN (#{namespace_ids.join(",")})
    61 SQL
    62 end
    • This would basically generate something like

      UPDATE namespaces
        SET root_id = CASE id
          WHEN 1 THEN 1
          WHEN 2 THEN 1
          WHEN 5 THEN 1
          WHEN 7 THEN 1
          WHEN 9 THEN 1
          WHEN 11 THEN 1
          WHEN 13 THEN 1
          WHEN 15 THEN 1
          WHEN 17 THEN 1
          WHEN 19 THEN 1
          WHEN 21 THEN 1
          WHEN 3 THEN 3
          WHEN 4 THEN 3
          WHEN 6 THEN 3
          WHEN 8 THEN 3
          WHEN 10 THEN 3
          WHEN 12 THEN 3
          WHEN 14 THEN 3
          WHEN 16 THEN 3
          WHEN 18 THEN 3
          WHEN 20 THEN 3
          WHEN 22 THEN 3
        END
      WHERE id IN (1,2,5,7,9,11,13,15,17,19,21,3,4,6,8,10,12,14,16,18,20,22)

      But with 10k statements. Unsure if this is the best update strategy :thinking:

      Edited by Mayra Cabrera
    • Please register or sign in to reply
  • 11 root_namespaces = root_namespaces_between(from_id: from_id, to_id: to_id)
    12 return if root_namespaces.empty?
    13
    14 namespaces_information = associate_children_with_root_namespaces(root_namespaces)
    15 sql_query = build_update_namespaces_sql(namespaces_information)
    16
    17 execute(sql_query)
    18 end
    19
    20 private
    21
    22 def root_namespaces_between(from_id:, to_id:)
    23 Namespace
    24 .where('parent_id IS NULL')
    25 .where(id: from_id..to_id)
    26 end
    • From Chatops:

      > SELECT namespaces.* FROM namespaces WHERE (namespaces.parent_id IS NOT NULL) AND (namespaces.id BETWEEN 1 AND 10000)
      
      Index Scan using index_namespaces_on_parent_id_and_id on namespaces  (cost=0.43..4758.31 rows=447 width=303) (actual time=26.683..26.683 rows=0 loops=1)
        Index Cond: ((parent_id IS NOT NULL) AND (id >= 1) AND (id <= 10000))
        Buffers: shared hit=792 read=15
        I/O Timings: read=3.449
      Planning time: 3.271 ms
      Execution time: 26.726 ms
    • Please register or sign in to reply
  • assigned to @nolith

  • closed

  • Please register or sign in to reply
    Loading