Skip to content

shared_buffers , work_mem and effective_cache_size setting review

Context

Production memory information is:

# GB
              total        used        free      shared  buff/cache   available
Mem:            409          22           9         115         377         268
Swap:             0           0           0
 
              total        used        free      shared  buff/cache   available
Mem:            409          17          14         115         377         272
Swap:             0           0           0

 shared_buffers
----------------
 112896MB

 effective_cache_size
----------------------
 338688MB

Right now, a 1/3 of the RAM is shared buffers. Question is: is it enough or we should consider increase it? Within this, we may recalculate effective_cache_size. Another improvement that may affect IO will be work_mem increase, which can lead into a high performance boost.

Symptoms

  • Shared Buffers is full for Postgres allocations, a very low percentage is dirt in the overall perspective.
  • Only a few tables have a considerable dirt percentage (index_merge_requests_on_description_trigram, index_notes_on_note_trigram, index_issues_on_description_trigram). That means that we have potential fix content in SB.
  • Most of the blocks have high usage counts, which means there is a high concurrency on few blocks. Only few relations do not fall into this group: merge_request_diff_files, index_merge_requests_on_description_trigram, index_issues_on_description_trigram.
  • There is considerable free memory available, probably this is related to the cache size estimation set in effective_cache_size (338688MB). That is , if SB isn't increase which may lead into a recalculation of the effective_cache_size.
  • It is being seeing a high concurrence per buffer (score/usagecount is 5 in most of the blocks), parallel execution may be beneficial. database#36
  • An optimization here without incurring in shared_buffers increase (non exclusive each other), will be to assign more work_mem per worker (right now at 16MB in default, 300 max_connections). From a previous inspect of the temporal file sizes of the queries above 5s, was 3.65MB in average for all the queries, meaning that by increasing this value we could get rid of unnecessary IO and allocate more in memory. The most dense queries were using around 10MB of temporal file in average. By setting this value on 32MB, we'll be in a safe range between max_connections and work_mem and the current available memory.

This might mean that we need to increase resource consumption on the server and increase buffers in memory through shared_buffers and effective_cache_size.

We need to have this information in a time-wise metric for going deeper into the shared buffers tuning.

Raw data

gitlabhq_production=# WITH agg AS (
gitlabhq_production(#     SELECT count(*) as buffer_count, sum(usagecount) as usagecount,
gitlabhq_production(#        count(case when isdirty then 1 end ) as dirty_buffers,
gitlabhq_production(#        -- round(100*(count(case when isdirty then 1 end ))/count(*)) as dirt_perc,
gitlabhq_production(#        sum(pinning_backends) as pinned_backends
gitlabhq_production(#     FROM pg_buffercache
gitlabhq_production(# )
gitlabhq_production-# SELECT *, round(((100*dirty_buffers::double precision)/buffer_count::double precision)::numeric,2)  as dirt_perc
gitlabhq_production-# FROM agg;
 buffer_count | usagecount | dirty_buffers | pinned_backends | dirt_perc
--------------+------------+---------------+-----------------+-----------
     14450688 |   45318553 |        126982 |              15 |      0.88


gitlabhq_production=#
gitlabhq_production=# WITH bufs AS (
gitlabhq_production(# SELECT  c.relname as rel, count(*) AS buffers,
gitlabhq_production(#         sum(usagecount) as score,
gitlabhq_production(#         count(Case when isdirty then 1 end )   as dirtpages,
gitlabhq_production(#         sum(pinning_backends) as pinbackend,
gitlabhq_production(#         sum(c.relpages) as relpages
gitlabhq_production(#              FROM pg_buffercache b INNER JOIN pg_class c
gitlabhq_production(#              ON b.relfilenode = pg_relation_filenode(c.oid) AND
gitlabhq_production(#                 b.reldatabase IN (0, (SELECT oid FROM pg_database
gitlabhq_production(#                                       WHERE datname = current_database()))
gitlabhq_production(#              GROUP BY c.relname
gitlabhq_production(#              ORDER BY 2 DESC
gitlabhq_production(#              LIMIT 20
gitlabhq_production(# ) -- returns the most populated relations in SB in that order
gitlabhq_production-# SELECT rel, buffers, (buffers * 8) /1024 as size_MB,
gitlabhq_production-#         score,  dirtpages, pinbackend,
gitlabhq_production-#         round(((100*dirtpages::double precision)/buffers::double precision)::numeric,2) as dirt_perc,
gitlabhq_production-#         round(((100*buffers::double precision)/relpages::double precision)::numeric,8) as perc_of_rel_in_SB
gitlabhq_production-# from bufs
gitlabhq_production-# ;

                     rel                     | buffers | size_mb |  score   | dirtpages | pinbackend | dirt_perc | perc_of_rel_in_sb
---------------------------------------------+---------+---------+----------+-----------+------------+-----------+-------------------
 ci_builds                                   | 3008573 |   23504 | 14089083 |       660 |          1 |      0.02 |        0.00000874
 merge_request_diff_files                    | 1087393 |    8495 |  2095096 |       904 |          0 |      0.08 |        0.00000198
 issues                                      |  990873 |    7741 |  4951479 |        11 |          0 |      0.00 |        0.00010095
 pg_toast_21971303                           |  716190 |    5595 |  1533467 |       307 |          0 |      0.04 |        0.00000219
 merge_requests                              |  619526 |    4840 |  3084303 |        21 |          0 |      0.00 |        0.00015928
 index_issues_on_description_trigram         |  533096 |    4164 |   821969 |       194 |          0 |      0.04 |        0.00006399
 index_notes_on_note_trigram                 |  345202 |    2696 |   613434 |       121 |          0 |      0.04 |        0.00003896
 projects                                    |  212276 |    1658 |  1061368 |      1514 |          1 |      0.71 |        0.00047124
 users                                       |  202721 |    1583 |  1013534 |      1463 |          0 |      0.72 |        0.00049343
 lfs_objects                                 |  178590 |    1395 |   892941 |         3 |          0 |      0.00 |        0.00055996
 notes                                       |  174838 |    1365 |   322835 |       284 |          0 |      0.16 |        0.00001656
 index_merge_request_diff_commits_on_sha     |  155092 |    1211 |   270680 |      3359 |          0 |      2.17 |        0.00001658
 ci_pipelines                                |  154486 |    1206 |   332578 |       240 |          0 |      0.16 |        0.00014578
 index_ci_builds_on_token                    |  132904 |    1038 |   265523 |      2233 |          1 |      1.68 |        0.00017520
 index_project_mirror_data_on_status         |  127079 |     992 |   137645 |        38 |          0 |      0.03 |        0.00078712
 index_issues_on_title_trigram               |  126162 |     985 |   229763 |        38 |          0 |      0.03 |        0.00040727
 uploads                                     |  123173 |     962 |   615855 |        12 |          0 |      0.01 |        0.00081871
 merge_requests_pkey                         |  123129 |     961 |   252232 |       299 |          0 |      0.24 |        0.00081267
 merge_request_diffs                         |  122517 |     957 |   236486 |       123 |          0 |      0.10 |        0.00014538
 index_merge_requests_on_description_trigram |  109484 |     855 |   188223 |       125 |          0 |      0.11 |        0.00032693

gitlabhq_production=# WITH viewbuf AS (
gitlabhq_production(# select c.relname as rel, usagecount, count(*) as bufcount
gitlabhq_production(# FROM pg_buffercache b INNER JOIN pg_class c
gitlabhq_production(#              ON b.relfilenode = pg_relation_filenode(c.oid) AND
gitlabhq_production(#                 b.reldatabase IN (0, (SELECT oid FROM pg_database
gitlabhq_production(#                                       WHERE datname = current_database()))
gitlabhq_production(# GROUP BY c.relname, usagecount
gitlabhq_production(# ORDER BY c.relname
gitlabhq_production(# )
gitlabhq_production-# -- Maximum 6 values per array.
gitlabhq_production-# -- We array everything cause there could be absent _states_ of the usagecount.
gitlabhq_production-# -- Both ixcount and usagecount are correlative in order, order_by_bufcount is the
gitlabhq_production-# -- one that gives you the order on the amount of buffers per each usagecount.
gitlabhq_production-# -- The highed the 5' scored blocks on a relation the better. Unfortunately there are
gitlabhq_production-# -- many scenarios of when and how the score of blocks can vary, so to give an idea
gitlabhq_production-# -- we ideally need to collect all of this information and aggregate.
gitlabhq_production-# SELECT rel, array_agg(usagecount order by usagecount desc) as ixcount , array_agg(bufcount order by usagecount desc) as usagecount,
gitlabhq_production-#       array_agg(usagecount order by bufcount desc) as order_by_bufcount
gitlabhq_production-# FROM viewbuf
gitlabhq_production-# group by rel
gitlabhq_production-# order by usagecount desc
gitlabhq_production-# limit 20;
                              rel                              |    ixcount    |                usagecount                | order_by_bufcount
---------------------------------------------------------------+---------------+------------------------------------------+-------------------
 ci_builds                                                     | {5,4,3,2,1,0} | {2721873,103259,34536,50115,28020,75078} | {5,4,0,2,3,1}
 issues                                                        | {5,4,3,2,1,0} | {990617,205,4,1,8,38}                    | {5,4,0,1,3,2}
 projects                                                      | {5,4}         | {212189,82}                              | {5,4}
 users                                                         | {5,4}         | {202364,357}                             | {5,4}
 lfs_objects                                                   | {5,4,1}       | {178563,26,1}                            | {5,4,1}
 merge_requests                                                | {5,4,3,2,1,0} | {160363,417794,38969,226,471,1723}       | {4,5,3,0,1,2}
 uploads                                                       | {5,4,3}       | {123148,16,1}                            | {5,4,3}
 merge_request_metrics                                         | {5,4}         | {103635,10}                              | {5,4}
 project_repository_states                                     | {5,1,0}       | {69108,1101,64}                          | {5,1,0}
 project_features                                              | {5,4}         | {64550,22}                               | {5,4}
 members                                                       | {5,4,3,2,1,0} | {63872,8821,10365,18681,583,135}         | {5,2,3,4,1,0}
 index_issues_on_description_trigram                           | {5,4,3,2,1,0} | {59531,38555,32991,63636,209540,129716}  | {1,0,2,5,4,3}
 routes                                                        | {5,4,3,2,1,0} | {47895,15029,12413,10508,6680,4750}      | {5,4,3,2,1,0}
 index_project_repository_states_on_project_id                 | {5,4}         | {44045,44}                               | {5,4}
 geo_event_log                                                 | {5,4}         | {39909,101}                              | {5,4}
 index_notes_on_note_trigram                                   | {5,4,3,2,1,0} | {39260,23544,15683,28980,220777,26996}   | {1,5,2,0,4,3}
 idx_projects_on_repository_storage_last_repository_updated_at | {5,4}         | {35030,35}                               | {5,4}
 protected_branch_merge_access_levels                          | {5,4}         | {34392,4}                                | {5,4}
 index_ci_builds_on_status_and_type_and_runner_id              | {5,4,3,2,1,0} | {32788,7985,1723,1925,2612,5515}         | {5,4,0,1,2,3}
 protected_branch_push_access_levels                           | {5,4}         | {32121,2}                                | {5,4}