Increase machine type for several services

Current Situation

Increase machine type for a number of services.

The problem is that there is an elevation of the node_schedstat_waiting component metric on many service instances in our fleet, after a roll-out of osqueryd throughout the environments. For more details please refer to: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15110

In lieu of an immediate tuning remedy for osqueryd performance problems, it has been recommended to increase the machine type of each of these instances and re-provision them.

Desired Outcome

The desired outcome is that the saturation and resource contention on these systems goes down.

Ideally, the node_schedstat_waiting component metric will decrease, as has been exemplified here: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15277

The reason this component metric will decrease is because by increasing the number of CPU cores, the processes competing for processing time will have to share that computing time with other processes less, meaning less processes waiting on operations to be scheduled for execution by the processor.

Acceptance Criteria

  • The instances in the camoproxy service cluster in gstg and gprd are assigned a vertically larger machine_type in their terraform module, and it is observed that the node_schedstat_waiting component metric decreases after re-provisioning the cluster member instances.
  • The instances in the redis-sentinal service cluster in gstg and gprd are assigned a vertically larger machine_type in their terraform module, and it is observed that the node_schedstat_waiting component metric decreases after re-provisioning the cluster member instances.
  • The instances in the redis-cache service cluster in gstg and gprd are assigned a vertically larger machine_type in their terraform module, and it is observed that the node_schedstat_waiting component metric decreases after re-provisioning the cluster member instances.
  • The instances in the registry service cluster in gstg and gprd are assigned a vertically larger machine_type in their terraform module, and it is observed that the node_schedstat_waiting component metric decreases after re-provisioning the cluster member instances.
  • The instances in the frontend (haproxy) service cluster in gstg and gprd are assigned a vertically larger machine_type in their terraform module, and it is observed that the node_schedstat_waiting component metric decreases after re-provisioning the cluster member instances.
Edited by Nels Nelson