Corrective action: standardize/document the Runners scale up process

Summary

During 2024-02-21: concurrent operational limits on Sa... (production#17631 - closed), one of the mitigation steps the team took was to scale up the small shard, however, that was not a straight forward task, although it [was easier compared to a year ago](find link).

The team referenced https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/ci-runners/linux/new-shards.md and figured a lot of steps as they scaled the fleet (see slack thread), but they still failed to:

Follow up on any errors the new runner-manager might have produced.
Update the firewall rules as needed, which eventually caused 2024-02-21: us-east1-d.ci-gateway.int.gprd.gitl... (production#17636 - closed).

The steps for scaling up an existing fleet are documented and standardized, preferably in a CR template.

Link the incident(s) this corrective action arose from
Give context for what problem this corrective action is trying to prevent re-occurring
Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4')
Assign a priority (this will default to 'Reliability::P4' but should match the severity of the related incident)
Assign a service label

Edited Feb 22, 2024 by Rehab