Skip to content

Reduce object allocations for large merge request

Tan Le requested to merge reduce-commit-collection-object-allocations into master

🧩 What does this MR do?

Reduce object allocations when processing commits from Gitaly. During the investigation of #281574, it was noted that to process a large MR with 13,796 commits, the enrich! method creates a lot of allocations.

I have attached the full profiling of this MR on staging: profiling-large-commit-gitaly.txt

Implementation

A more trivial example.

Given we would like to convert the following data structure

raw_commits = [
    [0] OpenStruct {
          :id => 1,
        :name => "tom-1"
    },
    [1] OpenStruct {
          :id => 2,
        :name => "tom-2"
    },
    [2] OpenStruct {
          :id => 3,
        :name => "tom-3"
    },
    [3] OpenStruct {
          :id => 4,
        :name => "tom-4"
    },
    [4] OpenStruct {
          :id => 5,
        :name => "tom-5"
    }
]

to

enriched_commits = {
     1 => "tom-1",
     2 => "tom-2",
     3 => "tom-3",
     4 => "tom-4",
     5 => "tom-5"
}
# Existing approach
enriched_commits = Hash[raw_commits.map { |o| [o.id, o.name] }].compact

# New approach
enriched_commits = raw_commits.each_with_object({}) { |o, result| result[o.id] = o.name }.compact

We can generate a really big array

raw_commits = (1..10000).map { |n| OpenStruct.new(id: n, name: "tom-#{n}") }

Before

Total allocations: 10003
pry(main)> profile = RubyProf.profile { enriched_commits = Hash[raw_commits.map { |o| [o.id, o.name] }].compact }
pry(main)> RubyProf::FlatPrinter.new(profile).print(STDOUT, min_percent: 2)
Measure Mode: allocations
Thread ID: 23720
Fiber ID: 86880
Total: 10003.000000
Sort by: self_time

 %self      total      self      wait     child     calls  name                           location
 99.98  10001.000 10001.000     0.000     0.000        1   Array#map                      

* recursively called methods

Columns are:

  %self     - The percentage of time spent in this method, derived from self_time/total_time.
  total     - The time spent in this method and its children.
  self      - The time spent in this method.
  wait      - The amount of time this method waited for other threads.
  child     - The time spent in this method's children.
  calls     - The number of times this method was called.
  name      - The name of the method.
  location  - The location of the method.

The interpretation of method names is:

  * MyObject#test - An instance method "test" of the class "MyObject"
  * <Object:MyObject>#test - The <> characters indicate a method on a singleton class.
[
    [0] #<RubyProf::Thread:0x00007f99b9070238>
]
Total time: 0.001640 secs
Measure Mode: wall_time
Thread ID: 23720
Fiber ID: 86880
Total: 0.001640
Sort by: self_time

 %self      total      self      wait     child     calls  name                           location
  7.76      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  7.03      0.000     0.000     0.000     0.000        1   <Class::Hash>#[]
  6.47      0.002     0.000     0.000     0.002        1   [global]#                      (pry):185
  6.41      0.001     0.000     0.000     0.001        1   Array#map
  5.44      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  5.28      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  5.25      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  5.09      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  4.73      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  4.34      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  4.22      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  4.15      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  3.56      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  3.32      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  3.28      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  3.19      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  2.88      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  2.78      0.000     0.000     0.000     0.000        1   Hash#compact
  2.78      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  2.64      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  2.52      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  2.36      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  2.32      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  2.21      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190

After

Total allocations: 4
pry(main)> profile = RubyProf.profile { enriched_commits = raw_commits.each_with_object({}) { |o, result| result[o.id] = o.name }.compact }
pry(main)> RubyProf::FlatPrinter.new(profile).print(STDOUT, min_percent: 2)
Measure Mode: allocations
Thread ID: 23720
Fiber ID: 86880
Total: 4.000000
Sort by: self_time

 %self      total      self      wait     child     calls  name                           location
 75.00      3.000     3.000     0.000     0.000        1   BasicObject#method_missing     
 25.00      4.000     1.000     0.000     3.000        1   [global]#                      (pry):128

* recursively called methods

Columns are:

  %self     - The percentage of time spent in this method, derived from self_time/total_time.
  total     - The time spent in this method and its children.
  self      - The time spent in this method.
  wait      - The amount of time this method waited for other threads.
  child     - The time spent in this method's children.
  calls     - The number of times this method was called.
  name      - The name of the method.
  location  - The location of the method.

The interpretation of method names is:

  * MyObject#test - An instance method "test" of the class "MyObject"
  * <Object:MyObject>#test - The <> characters indicate a method on a singleton class.
[
    [0] #<RubyProf::Thread:0x00007f99fe534180>]
Total time: 0.000545 secs
Measure Mode: wall_time
Thread ID: 23720
Fiber ID: 86880
Total: 0.000545
Sort by: self_time

 %self      total      self      wait     child     calls  name                           location
 13.07      0.000     0.000     0.000     0.000        1   Array#each                     
  9.57      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  7.37      0.000     0.000     0.000     0.000        1   Hash#compact                   
  6.32      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  5.31      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  5.05      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  4.78      0.000     0.000     0.000     0.000        1   Enumerable#each_with_object    
  4.19      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  4.18      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  4.07      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  3.97      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  3.89      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  3.78      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  3.67      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  2.76      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  2.48      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  2.44      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#id        /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  2.32      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  2.29      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  2.28      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  2.14      0.000     0.000     0.000     0.000        1   <Object::OpenStruct>#name      /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
  2.06      0.001     0.000     0.000     0.001        1   [global]#                      (pry):187

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Tan Le

Merge request reports