Decommission ES cluster and create a new one with more nodes but smaller specs
This is part of the META
issue https://gitlab.com/gitlab-com/infrastructure/issues/1597
As a continuation for https://gitlab.com/gitlab-com/infrastructure/issues/2157, we saw that we couldn't cope with the indexing load for a couple of reasons:
- Not all nodes are receiving the same amount of requests (this is due to the use of
parent-child
document relationship, which is causing an unbalance across the shard size of the cluster
We didn't even index 1% of what's on GLcom, but @nick.thomas calculated that the space distribution should be even:
[{"count"=>"102871", "size_mib"=>"3800023.516858100891", "shard"=>"0"},
{"count"=>"103217", "size_mib"=>"3667799.555052757263", "shard"=>"1"},
{"count"=>"103195", "size_mib"=>"4278779.600588798523", "shard"=>"2"},
{"count"=>"103235", "size_mib"=>"3400449.739967346191", "shard"=>"3"},
{"count"=>"103298", "size_mib"=>"3745645.905962944031", "shard"=>"4"},
{"count"=>"103524", "size_mib"=>"3708715.532225608826", "shard"=>"5"},
{"count"=>"103180", "size_mib"=>"4372124.374240875244", "shard"=>"6"},
{"count"=>"103488", "size_mib"=>"3521339.539809226990", "shard"=>"7"},
{"count"=>"103548", "size_mib"=>"4258394.189184188843", "shard"=>"8"},
{"count"=>"103335", "size_mib"=>"3896756.068240165710", "shard"=>"9"},
{"count"=>"103335", "size_mib"=>"4073603.255154609680", "shard"=>"10"},
{"count"=>"103247", "size_mib"=>"3673936.535663604736", "shard"=>"11"},
{"count"=>"103435", "size_mib"=>"3441760.125455856323", "shard"=>"12"},
{"count"=>"103570", "size_mib"=>"3957315.223025321960", "shard"=>"13"},
{"count"=>"103123", "size_mib"=>"3680589.868497848511", "shard"=>"14"},
{"count"=>"103255", "size_mib"=>"3916405.046738624573", "shard"=>"15"},
{"count"=>"103545", "size_mib"=>"3748733.100311279297", "shard"=>"16"},
{"count"=>"103267", "size_mib"=>"3725391.259220123291", "shard"=>"17"}]
- Not all the nodes are receiving
PUT
requests over theindex
API due to a problem in the ruby client, which is being tracked here.
Also, 18 shards is not enough and we need to increase the number. We can have a cheaper cluster with more nodes (but less powerful) while also cutting in available disk space, as we don't need 4TB/node of avail space.
The current list of tasks is:
-
Get rid of elasticsearch0[123].db.gitlab.com
-
Change terraform resources -
From DS12_v2 -> DS11_v2 -
From 4x 1TB disks -> 4x 512GB disks
-