[Gstg | Move LLM Sidekiq workers to Gstg Shard]
In gitlab-org/gitlab#489871 (closed) we discussed possible approaches to
improve the resiliency of AI actions to Sidekiq outages. We decided to explore the creation
of a new Sidekiq shard exclusive to The Llm::CompletionWorker
to isolate it from other
workers. To that effect, we'll be following the
Creating a Sidekiq Shard runbook.
Specifically, the tasks covered here will be:
-
@nateweinshenker [ ] Modify the necessary items in [runbooks] to ensure the new shard will have it's own dedicated metrics. Includes at least the following: -
Add an entry in shards
in metrics-catalog/services/lib/sidekiq-helpers.libsonnet -
The following doesn't seem to exist anymore Add a line toservices
in dashboards/delivery/k8s_migration_overview.dashboard.jsonnet
-
-
@nateweinshenker [ ] Modify the necessary items in [k8s-workloads/gitlab-helmfiles] such that logging is configured for the new shard. -
Add a new section in lib/fluentd/logging-config.yaml
.
-
-
If necessary create a new dedicated node pool: We don't need a new node pool-
Add in terraform; currently inenvironments/ENV/gke-regional.tf
; generally look for the other node pool definitions and duplicate/extend
-
-
@alejandro: Modify [k8s-workloads/gitlab-com] adding the new sidekiq shard by adding a new section in gitlab.sidekiq.pods
with settings determined above-
This prepares a place for the jobs to run but does not cause anything to be routed to them just yet. The "queues" value is the list of queues (probably just one) that this shard will listen on (used in the next step). -
Also add a new entry in the auto-deploy-image-check
list.
-
-
@alejandro: Modify global.appConfig.sidekiq.routingRules
in [k8s-workloads/gitlab-com] to select the jobs you want (by name or other characteristics) in the first array value, and route them to the new queue (the second value in the array, being the name of the queue that the new shard is listening on)
Note: After scalability#1682 is complete,
we should move this new shard to use the urgent
pgbouncer, see
this comment
Edited by Nathan Weinshenker