Increase machine type for several services
Current Situation
Increase machine type for a number of services.
The problem is that there is an elevation of the node_schedstat_waiting component metric on many service instances in our fleet, after a roll-out of osqueryd throughout the environments. For more details please refer to: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15110
In lieu of an immediate tuning remedy for osqueryd performance problems, it has been recommended to increase the machine type of each of these instances and re-provision them.
Desired Outcome
The desired outcome is that the saturation and resource contention on these systems goes down.
Ideally, the node_schedstat_waiting component metric will decrease, as has been exemplified here: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15277
The reason this component metric will decrease is because by increasing the number of CPU cores, the processes competing for processing time will have to share that computing time with other processes less, meaning less processes waiting on operations to be scheduled for execution by the processor.
Acceptance Criteria
-
The instances in the camoproxyservice cluster ingstgandgprdare assigned a vertically largermachine_typein their terraform module, and it is observed that thenode_schedstat_waitingcomponent metric decreases after re-provisioning the cluster member instances.- Task issue: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15277
- Completed change plan: production#6410 (closed)
-
The instances in the redis-sentinalservice cluster ingstgandgprdare assigned a vertically largermachine_typein their terraform module, and it is observed that thenode_schedstat_waitingcomponent metric decreases after re-provisioning the cluster member instances.- Task issue: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15402
- Draft
stagingchange plan: production#6534 (closed)
-
The instances in the redis-cacheservice cluster ingstgandgprdare assigned a vertically largermachine_typein their terraform module, and it is observed that thenode_schedstat_waitingcomponent metric decreases after re-provisioning the cluster member instances. -
The instances in the registryservice cluster ingstgandgprdare assigned a vertically largermachine_typein their terraform module, and it is observed that thenode_schedstat_waitingcomponent metric decreases after re-provisioning the cluster member instances. -
The instances in the frontend(haproxy) service cluster ingstgandgprdare assigned a vertically largermachine_typein their terraform module, and it is observed that thenode_schedstat_waitingcomponent metric decreases after re-provisioning the cluster member instances.