Elasticsearch total_shards_per_node prevents rollover to data_warm nodes

Here is a screenshot of available disk space going up for data_warm nodes, but going down for data_hot nodes. At the time of this writing, there is very little disk space left for data nodes.

After removing the total_shards_per_node restriction, you can see things start to come back to normal:

We have a script that runs every 10 minutes that sets total_shards_per_node: 1 here: https://gitlab.com/gitlab-com/runbooks/-/blob/master/elastic/scheduled/hot_index_shards_per_node.sh

First, we probably need to disable this script to prevent this problem from happening every 10 minutes.
Afterwards, we should investigate why this is a problem. This problem came up during the upgrade of the logging cluster here: #6001 (comment 755191777)

Temporary workaround (only works for ~ 10 minutes) is to remove the total_shards_per_node setting across the cluster:

PUT /*/_settings
 {
  "index.routing.allocation.total_shards_per_node" : null
 }

Edited Dec 13, 2021 by John Mason