Skip to content
Snippets Groups Projects

[Sidekiq] Support for podAntiAffinity with match labels of deployment name

Merged Dmytro Makovey requested to merge 2857-support-for-podantiaffinity into master
1 unresolved thread

What does this MR do?

Implement podAntiAffinity in such way that allows to spread out Sidekiq pods across the cluster.

Based on @WarheadsSE 's suggestion - adding queue-pod-name to the list of matchLabels in podAntiAffinity section should provide some immediate relief as it should prevent 2 pods with the same queue spread out to different nodes.

Related issues

Closes #2857 (closed)

Checklist

See Definition of done.

For anything in this list which will not be completed, please provide a reason in the MR discussion.

Required

  • Merge Request Title and Description are up to date, accurate, and descriptive
  • MR targeting the appropriate branch
  • MR has a green pipeline on GitLab.com

Expected (please provide an explanation if not completing)

  • Test plan indicating conditions for success has been posted and passes
  • Documentation created/updated
  • Tests added
  • Integration tests added to GitLab QA
  • Equivalent MR/issue for omnibus-gitlab opened
Edited by Mitchell Nielsen

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Dmytro Makovey changed milestone to %14.4

    changed milestone to %14.4

  • Dmytro Makovey added 1 commit

    added 1 commit

    • 0c7f2c80 - implement simples of changes

    Compare with previous version

  • 1 Warning
    Please check the QA job and compare with builds on master. If no new failures are reported in QA job, add 'QA:verified' label.
    1 Message
    📖 Please add the workflowready for review label once you think the MR is ready to for an initial review.

    If from a community member, ask that the Community contribution label be added as well.

    Merge requests are handled according to the workflow documented in our handbook and should receive a response within the limit documented in our First-response SLO.

    If you don't receive a response, please mention @gitlab-org/distribution, or one of our Project Maintainers

    Generated by 🚫 Danger

  • Dmytro Makovey changed title from Draft: Resolve "Support for podAntiAffinity with match labels of deployment name or other finer-grained label" to Draft: Support for podAntiAffinity with match labels of deployment name

    changed title from Draft: Resolve "Support for podAntiAffinity with match labels of deployment name or other finer-grained label" to Draft: Support for podAntiAffinity with match labels of deployment name

  • Dmytro Makovey marked the checklist item Merge Request Title and Description are up to date, accurate, and descriptive as completed

    marked the checklist item Merge Request Title and Description are up to date, accurate, and descriptive as completed

  • Dmytro Makovey marked the checklist item MR targeting the appropriate branch as completed

    marked the checklist item MR targeting the appropriate branch as completed

  • Dmytro Makovey changed the description

    changed the description

  • Dmytro Makovey added 1 commit

    added 1 commit

    • 4d19f874 - implement simple anti-affinity rule based on queue name

    Compare with previous version

  • Dmytro Makovey marked this merge request as ready

    marked this merge request as ready

  • Dmytro Makovey resolved all threads

    resolved all threads

  • Testing

    KinD configuration

    I modified examples/kind/kind-ssl to provision 2 worker nodes so the workloads can be spread out base don the antiAffinity:

    diff --git a/examples/kind/kind-ssl.yaml b/examples/kind/kind-ssl.yaml
    index ef2a4993f..3c4c8d52d 100644
    --- a/examples/kind/kind-ssl.yaml
    +++ b/examples/kind/kind-ssl.yaml
    @@ -16,3 +16,5 @@ nodes:
       - containerPort: 32022
         hostPort: 32022
         listenAddress: "0.0.0.0"
    +- role: worker
    +- role: worker

    $ kind create cluster --config examples/kind/kind-ssl.yaml

    Values

    gitlab:
      sidekiq:
        antiAffinity: hard
        minReplicas: 2
        pods:
          - name: "cronjobs"
            queues: "cronjob:*"
          - name: catch-all
            queues: "*"
    global:
      antiAffinity: hard

    Node separation per Sidekiq Deployment/Queue

    $ kubectl get pods -lapp=sidekiq -owide | awk '{print $1,$7}'
    NAME                                            NODE
    gitlab-sidekiq-catch-all-v1-84475bd796-7rlz7    kind-worker2
    gitlab-sidekiq-catch-all-v1-84475bd796-85gdk    kind-worker
    gitlab-sidekiq-cronjobs-v1-7cfd448489-hx6tc     kind-worker2
    gitlab-sidekiq-cronjobs-v1-7cfd448489-mjxqh     kind-worker

    Note that each Sidekiq Deployment/Queue's pods are spread out between the 2 available nodes.

    Comparison to Webservice

    It's worth noting that we do something similar already with Webservice when there is more than one Deployment:

            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                - topologyKey: "kubernetes.io/hostname"
                  labelSelector:
                    matchLabels:
                      app: webservice
                      release: test
    
                      gitlab.com/webservice-name: default # <- already injects Deployment-specific label
    Edited by Mitchell Nielsen
  • Mitchell Nielsen marked the checklist item MR has a green pipeline on GitLab.com as completed

    marked the checklist item MR has a green pipeline on GitLab.com as completed

  • Mitchell Nielsen approved this merge request

    approved this merge request

  • Thanks @dmakovey! Approved and up for maintainer review.

  • Mitchell Nielsen requested review from @twk3

    requested review from @twk3

  • Mitchell Nielsen changed title from Support for podAntiAffinity with match labels of deployment name to [Sidekiq] Support for podAntiAffinity with match labels of deployment name

    changed title from Support for podAntiAffinity with match labels of deployment name to [Sidekiq] Support for podAntiAffinity with match labels of deployment name

  • Mitchell Nielsen requested review from @mnielsen

    requested review from @mnielsen

  • 🤖 GitLab Bot 🤖 changed milestone to %14.5

    changed milestone to %14.5

  • Mitchell Nielsen removed review request for @twk3

    removed review request for @twk3

  • Author Developer

    @WarheadsSE, @mnielsen : based on !2227 (comment 711759199) our "transition" option is to use Recreate strategy. After which everything settles in and further updates do not require Recreate. So my question is: should we advise in release notes about this as a transitional step for those who implement hard antiaffinity rather than codifying that change as a permanent solution?

    Edited by Dmytro Makovey
  • Author Developer

    As a side-note: while I was testing I've noticed HPA bumped replicas to 4 for catch-all queue to which the only response is to inform HPA to scale down. But I think that's the problem anyone would face whether it's 5.4.0 behaviour or modified one.

  • Author Developer

    Note on bumped pods per label change:

    $ k get pods
    Default context: gke-dmitry-olm in gke_cloud-native-182609_us-central1-b_dmitry-olm cluster
    Default namespace: gitlab
    NAME                                                    READY   STATUS      RESTARTS   AGE
    ...
    gitlab-issuer-4-85rb6                                   1/1     Running     0          2m6s
    gitlab-migrations-4-5pmwp                               0/1     Completed   0          2m6s
    ...
    gitlab-minio-create-buckets-4-pflp7                     0/1     Completed   0          2m6s
    ...
    gitlab-sidekiq-catch-all-v1-5d4c68dc5c-kprw5            1/1     Running     0          2m4s
    gitlab-sidekiq-cronjobs-v1-75475974b4-6zs8r             1/1     Running     0          2m4s
    gitlab-sidekiq-github-v1-5677f66749-tcst7               1/1     Running     0          2m4s
    ...

    in other words it seems we're just dealing with jobs re-running (an OK thing) and relevant pod recreation (expected)

  • Dmytro Makovey added 1 commit

    added 1 commit

    • c95f57e2 - bump sidekiq queue name with v2

    Compare with previous version

  • Dmytro Makovey added 38 commits

    added 38 commits

    Compare with previous version

  • Mitchell Nielsen resolved all threads

    resolved all threads

    • Resolved by Mitchell Nielsen

      Testing

      Cluster setup

      diff --git a/examples/kind/kind-ssl.yaml b/examples/kind/kind-ssl.yaml
      index ef2a4993f..3c4c8d52d 100644
      --- a/examples/kind/kind-ssl.yaml
      +++ b/examples/kind/kind-ssl.yaml
      @@ -16,3 +16,5 @@ nodes:
         - containerPort: 32022
           hostPort: 32022
           listenAddress: "0.0.0.0"
      +- role: worker
      +- role: worker

      kind create cluster --config examples/kind/kind-ssl.yaml

      Values

      gitlab:
        sidekiq:
          antiAffinity: hard
          minReplicas: 2
          pods:
            # list of queues: https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/workers/all_queues.yml
            - name: "cronjobs"
              queues: "cronjob:*"
            - name: catch-all
              queues: "*"
      global:
        antiAffinity: hard

      Results

      1. Install from master

      Pods came up as expected, with each queue's second pod stuck in Pending:

      $ k get pods -o json -l app=sidekiq  | jq '[.items[] | {name: .metadata.name, labels: .metadata.labels, containerNodes: .spec.nodeName}]'
      [
        {
          "name": "gitlab-sidekiq-catch-all-v1-7669568687-7t9r7",
          "labels": {
            "app": "sidekiq",
            "chart": "sidekiq-5.4.1",
            "heritage": "Helm",
            "pod-template-hash": "7669568687",
            "queue-pod-name": "catch-all",
            "release": "gitlab"
          },
          "containerNodes": null
        },
        {
          "name": "gitlab-sidekiq-catch-all-v1-7669568687-szg4q",
          "labels": {
            "app": "sidekiq",
            "chart": "sidekiq-5.4.1",
            "heritage": "Helm",
            "pod-template-hash": "7669568687",
            "queue-pod-name": "catch-all",
            "release": "gitlab"
          },
          "containerNodes": "kind-worker"
        },
        {
          "name": "gitlab-sidekiq-cronjobs-v1-56bbd567c-9kpql",
          "labels": {
            "app": "sidekiq",
            "chart": "sidekiq-5.4.1",
            "heritage": "Helm",
            "pod-template-hash": "56bbd567c",
            "queue-pod-name": "cronjobs",
            "release": "gitlab"
          },
          "containerNodes": null
        },
        {
          "name": "gitlab-sidekiq-cronjobs-v1-56bbd567c-kg787",
          "labels": {
            "app": "sidekiq",
            "chart": "sidekiq-5.4.1",
            "heritage": "Helm",
            "pod-template-hash": "56bbd567c",
            "queue-pod-name": "cronjobs",
            "release": "gitlab"
          },
          "containerNodes": "kind-worker2"
        }
      ]

      2. Install from MR branch

      As mentioned, -v1 Sidekiq pods were immediately terminated.

      The -v2 Sidekiq pods came up, with each queue's pods on separate nodes:

      $ k get pods -o json -l app=sidekiq  | jq '[.items[] | {name: .metadata.name, labels: .metadata.labels, containerNodes: .spec.nodeName}]'
      [
        {
          "name": "gitlab-sidekiq-catch-all-v2-585b57bb7-ltkn7",
          "labels": {
            "app": "sidekiq",
            "chart": "sidekiq-5.4.0",
            "heritage": "Helm",
            "pod-template-hash": "585b57bb7",
            "queue-pod-name": "catch-all",
            "release": "gitlab"
          },
          "containerNodes": "kind-worker"
        },
        {
          "name": "gitlab-sidekiq-catch-all-v2-585b57bb7-qrmfl",
          "labels": {
            "app": "sidekiq",
            "chart": "sidekiq-5.4.0",
            "heritage": "Helm",
            "pod-template-hash": "585b57bb7",
            "queue-pod-name": "catch-all",
            "release": "gitlab"
          },
          "containerNodes": "kind-worker2"
        },
        {
          "name": "gitlab-sidekiq-cronjobs-v2-7c95588f8d-d5gt9",
          "labels": {
            "app": "sidekiq",
            "chart": "sidekiq-5.4.0",
            "heritage": "Helm",
            "pod-template-hash": "7c95588f8d",
            "queue-pod-name": "cronjobs",
            "release": "gitlab"
          },
          "containerNodes": "kind-worker2"
        },
        {
          "name": "gitlab-sidekiq-cronjobs-v2-7c95588f8d-rfz2j",
          "labels": {
            "app": "sidekiq",
            "chart": "sidekiq-5.4.0",
            "heritage": "Helm",
            "pod-template-hash": "7c95588f8d",
            "queue-pod-name": "cronjobs",
            "release": "gitlab"
          },
          "containerNodes": "kind-worker"
        }
      ]

      No more pods stuck in Pending

      Note that this migration bounced all of the following:

      • Gitaly
      • Gitlab-Runner
      • GitLab-Shell
      • TaskRunner
      • Webservice

      This is likely because this branch needs to be rebased with more recent changes from master, but it's worth testing to double-check that.

  • Mitchell Nielsen added 15 commits

    added 15 commits

    • d4d85e1e...fac2f5c4 - 14 commits from branch master
    • ff1f5923 - Merge branch 'master' of gitlab.com:gitlab-org/charts/gitlab into 2857-support-for-podantiaffinity

    Compare with previous version

  • Mitchell Nielsen resolved all threads

    resolved all threads

    • Resolved by Mitchell Nielsen

      Functionally, this MR seems to be in a good state. As such, I'm going to approve and hand up for maintainer review.

      Notes for the maintainer: as for implementation, !876 (merged) was a similar change (it added -v1), and it had two blockers:

      1. Documentation for the change: !1110 (merged)
      2. Waiting for the 3.x release

      Will it be best to follow a similar path for this as well? My 2 cents: adding the documentation is straightforward, but waiting for the next major release means this MR would be delayed for quite some time. Open to your thoughts on this one based on past experience 👍

  • requested review from @WarheadsSE

  • Mitchell Nielsen removed review request for @WarheadsSE

    removed review request for @WarheadsSE

  • Dmytro Makovey added 2 commits

    added 2 commits

    • f05d8b30 - document changes to sidekiq hpa/deployment/pods names
    • 81ceb5ee - Merge branch '2857-support-for-podantiaffinity' of...

    Compare with previous version

  • Dmytro Makovey added 1 commit

    added 1 commit

    Compare with previous version

  • Mitchell Nielsen resolved all threads

    resolved all threads

  • Mitchell Nielsen marked the checklist item Documentation created/updated as completed

    marked the checklist item Documentation created/updated as completed

  • Mitchell Nielsen marked the checklist item Tests added as completed

    marked the checklist item Tests added as completed

  • Thanks again @dmakovey 👍

    Up for maintainer review.

  • requested review from @WarheadsSE

  • Dmytro Makovey resolved all threads

    resolved all threads

  • Dmytro Makovey added 1 commit

    added 1 commit

    • 30914c04 - Apply 1 suggestion(s) to 1 file(s)

    Compare with previous version

  • Jason Plum approved this merge request

    approved this merge request

  • Jason Plum enabled an automatic merge when the pipeline for 30914c04 succeeds

    enabled an automatic merge when the pipeline for 30914c04 succeeds

  • merged

  • Jason Plum mentioned in commit 14ebba0f

    mentioned in commit 14ebba0f

  • Jason Plum mentioned in issue #2300

    mentioned in issue #2300

  • Please register or sign in to reply
    Loading