Skip to content

Investigate : Rails 6.0 and Rails 6.1 native database sharding

Investigate how GitLab works with Rails 6.0 and Rails 6.1 native database sharding.

See https://guides.rubyonrails.org/active_record_multiple_databases.html

/cc @fzimmer

Goals

  1. Add multiple databases
  2. Create new shard with top-level namespace
  3. Show a group from new shard
  4. Create a new sub-group in new shard
  5. Show a group's members
  6. Create new project in new shard
  7. Run a pipeline
  8. Investigate dynamically add new databases
  9. Report back

Findings

Summary: Rails 6.1 is viable to be used for the horizontal sharding POC

  1. I was able to run migrations in a shard, show the group page for a group, create a sub-group and show the group page for a sub-group.

  2. It is possible to create multiple DB with a static configuration

  3. There are some hints that may help with creating a dynamic config/database.yml in https://github.com/rails/rails/pull/36560

  4. We can run migrations for each DB, see bin/rails -T, after you setup multiple DBs. rails db:migrate migrates DB migrations for all databases, which is nice.

  5. The main mechanism is setting up connects_to(shard: ...) on each model. You then use connected_to(role: :reading, shard: :shard_one) { query } to run a query. Note the role is required. While using connected_to, non sharded models are not affected, eg.

    gitlabhq_development=# select count(1) from users;
     count 
    -------
        98
    gitlabhq_development_shard_one=# select count(1) from users;
     count 
    -------
         1
    gitlabhq_development=# select count(1) from namespaces;
     count 
    -------
       107
    gitlabhq_development_shard_one=# select count(1) from namespaces;
     count 
    -------
        10
    
    [2] pry(main)> NamespaceShard.connected_to(role: :reading, shard: :shard_one) { puts Namespace.count; puts User.count }
       (10.4ms)  SELECT COUNT(*) FROM "namespaces" /*application:console,line:(pry):2:in `block in <main>'*/
    10
       (2.1ms)  SELECT COUNT(*) FROM "users" /*application:console,line:(pry):2:in `block in <main>'*/
    98
  6. More research is needed on how to setup connects_to(shard: ...) with dynamic shards.

  7. Rails Associations are not smart enough to know about shards. Here it's getting confused with the routes association.

    # Good
    [8] pry(main)> Group.with_route.sharded_find 2
      Group Load (2.5ms)  SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 2 LIMIT 1 /*application:console,line:/app/models/namespace.rb:138:in `block in get'*/
      Route Load (0.6ms)  SELECT "routes".* FROM "routes" WHERE "routes"."source_type" = 'Namespace' AND "routes"."source_id" = 2 /*application:console,line:/app/models/namespace.rb:138:in `block in get'*/
    => #<Group id:2 @lost-and-found>
    
    # Bad, the only mitigation is to ensure unique IDs across shards
    [9] pry(main)> Group.sharded_find 2
      Group Load (0.6ms)  SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 2 LIMIT 1 /*application:console,line:/app/models/namespace.rb:138:in `block in get'*/
      Route Load (1.0ms)  SELECT "routes".* FROM "routes" WHERE "routes"."source_id" = 2 AND "routes"."source_type" = 'Namespace' LIMIT 1 /*application:console,line:/app/models/concerns/routable.rb:103:in `full_path'*/
    => #<Group id:2 @bethany_mccullough>
  8. Same problem but with parent association

     Group.new(parent_id: 2).parent.path
      Namespace Load (0.7ms)  SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."id" = 2 LIMIT 1 /*application:console,line:(pry):12:in `<main>'*/
    => "bethany_mccullough"
  9. The Rails logs does not really show which shard a query is executed on. Potential Observability improvement here, to annotate with ActiveRecord::Base.current_shard

  10. while_preventing_writes (https://github.com/rails/rails/blob/8b83793549fed994bf0231aff444aa74648b3c35/activerecord/lib/active_record/connection_handling.rb#L219) and also the prevent_writes: option looks useful too.

  11. See !60088 (closed) for more detailed notes, and POC code

  12. I was able to dynamically construct a shard aware model using ActiveRecord::Base.configurations.configs_for(env_name: Rails.env). This seems to work with 2 DBs (three tier), 1 'primary' DB (three tier), and 1 DB (two tier).

  13. This means if for any reason a record was created in a shard, but the shard was withdrawn from configuration, we will see errors like No connection pool for 'NamespaceShard' found for the 'shard_one' shard.. Much like Gitaly shards.

Due date

Estimate 2021-04-30, confidence: 50%

Edited by Thong Kuah