Skip to content

Add scripts to collect GC setting results

Matthias Käppler requested to merge 289838-explore-gc-settings into master

What does this MR do?

In #289838 (closed) we are looking to tune Ruby GC settings better to the needs of our application. However, we weren't (and to an extent still aren't) completely sure what the most appropriate settings are. To help us out find the most impactful levers in terms of speed/memory trade-offs we decided to write a script that loads GitLab with a variety of different settings and value ranges and measure how this impacts GC behavior, memory used, and startup time.

The script is still fairly crude but we can look to evolve this over time, as this will likely have to be repeated for different work loads or simply as GitLab evolves.

The script can be run manually or by triggering the memory-gc-stats job in CI.

Screenshots

Here is some abridged output of a single test:

git@666a578c3722:~/gitlab$ PAR=8 scripts/perf/gc/collect_gc_stats.rb 2>/dev/null | tee tmp/gc_settings_exp/out.csv
setting,value,minor_gc_count,major_gc_count,heap_live_slots,heap_free_slots,total_allocated_pages,total_freed_pages,malloc_increase_bytes,malloc_increase_bytes_limit,oldmalloc_increase_bytes,oldmalloc_increase_bytes_limit,RSS,gc_time_s,cpu_utime_s,cpu_stime_s,real_time_s
RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR,1.4,71,18,2740550,1382310,10115,0,240288,30330547,25185472,124091833,781864960,3.1718633570002166,42.249831,3.30261,45.941048101998604
RUBY_GC_MALLOC_LIMIT_MAX,16777216,93,23,2740223,871440,8861,0,240288,16777216,1261952,117948573,723836928,3.664328899000056,42.982287,3.3738940000000004,46.76500607699927
RUBY_GC_HEAP_GROWTH_FACTOR,1.2,85,27,2741422,763087,8598,0,240288,29723936,240288,117948576,720515072,3.6808885479992,43.497158000000006,3.194905,47.0727705270001
RUBY_GC_HEAP_FREE_SLOTS_MAX_RATIO,0.2,67,21,2740249,1078591,9369,0,240288,29723936,19374104,131586007,762191872,3.1032002909987404,44.088045,3.347735,47.832934510002815
RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO,0.1,82,24,2740220,615121,8232,0,240288,29723936,19501496,131586007,746119168,3.154390969000753,44.608787,3.308356,48.253922302999854
DEFAULTS,,66,22,2740238,1102576,9428,0,240288,29723936,20193304,131586007,777318400,3.08532557200009,45.013158000000004,3.1998949999999997,48.56302219200006
RUBY_GC_OLDMALLOC_LIMIT,8388608,69,25,2740244,896808,8923,0,240288,29723936,21487376,131586007,772857856,3.0437161360004104,45.226611,3.3958470000000003,48.99548633300219
RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR,2.5,67,21,2740240,1078522,9369,0,240288,29723936,19803616,131586007,754880512,3.0493695329997896,45.733331,3.30059,49.42939641499834
RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR,1,67,21,2740300,1078107,9368,0,240288,29723936,19375432,131586007,763305984,3.1334343840004877,43.393683,3.056368,46.80562402199939
RUBY_GC_MALLOC_LIMIT_MAX,8388608,132,18,2740223,912649,8962,0,240288,16777216,37354880,72435166,695050240,4.876963488000024,42.185849,3.241408,45.80949771100131
RUBY_GC_HEAP_GROWTH_FACTOR,1,67,21,2740241,1078581,9369,0,240304,29723936,19368032,131586007,752566272,3.064799008999742,43.330758,3.123677,46.827482058000896
RUBY_GC_HEAP_INIT_SLOTS,100000,45,17,2740324,1078452,9369,0,240288,29723936,19316640,131586007,755642368,3.0575755560003612,43.235668,3.171762,46.73985183200057
RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR,1.5,67,21,2740361,1078333,9369,0,240288,29723936,19379064,131586007,751550464,3.020581143000563,43.222015,2.997365,46.66422984499877
RUBY_GC_HEAP_FREE_SLOTS_MAX_RATIO,0.02,67,21,2740700,1078118,9369,0,240288,29723936,19398608,131586007,756346880,3.066689913998789,44.736999,3.1816969999999998,48.279837124999176
RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO,0.01,113,25,2740205,299285,7457,0,240288,27975929,20212936,131586007,706641920,3.3894339299997305,45.140169,3.132812,48.64417254500222
RUBY_GC_OLDMALLOC_LIMIT,4194304,68,28,2740267,856353,8824,0,240288,30330547,20488952,126789218,764297216,2.976002941000086,45.601689,3.103582,49.05113073800021
...

Stats are collected as CSV so they can be easily imported into a spreadsheet. We also print other diagnostic output to stderr such as the full GC stats instance as well as a GC::Profiler report.

The PAR environment variable controls the degree of parallelism. The script will attempt to run PAR processes that can be distributed onto separate cores to speed up script execution, which would otherwise be very slow (with PAR=2 the CI job takes about 30 minutes.)

Example output from an earlier CI run: https://gitlab.com/gitlab-org/gitlab/-/jobs/909065522/artifacts/file/tmp/stats.csv

Does this MR meet the acceptance criteria?

Conformity

Edited by Matthias Käppler

Merge request reports