Many defunct processes on Sidekiq/Webservice Pods running on Kubernetes Infrastructure
Problem Statement
While researching sidekiq related memory usage, I stumbled across an odd behavior that is present on all of the catchall
fleet of Sidekiq Pods running GitLab.com. Over the course of time, we appear to gather up a rather large amount of defunct processes. Here's an example:
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 Z git 98 1 0 80 0 - 0 - 14:43 ? 00:00:00 [gpgconf] <defunct>
4 Z git 102 1 0 80 0 - 0 - 14:43 ? 00:00:00 [gpg] <defunct>
4 Z git 104 1 0 80 0 - 0 - 14:43 ? 00:00:00 [gpgsm] <defunct>
5 Z git 146 1 0 80 0 - 0 - 14:45 ? 00:00:00 [gpg-agent] <defunct>
Here's a full example, however. Click to expand
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
4 S git 1 0 0 80 0 - 38785 do_sys 14:42 ? 00:00:01 ruby /srv/gitlab/bin/sidekiq-cluster -r /srv/gitlab -e production --min-concurrency 15 --max-concurrency 15 -t 25 default,mailers,project_import_schedule 4 S git 32 1 0 80 0 - 176474 futex_ 14:42 ? 00:00:33 /usr/local/bin/gitlab-logger /var/log/gitlab
4 S git 37 1 43 80 0 - 1205987 do_sel 14:42 ? 00:33:58 sidekiq 6.4.0 queues:default,mailers,project_import_schedule [0 of 15 busy]
5 S git 39 1 1 80 0 - 66971 do_sys 14:42 ? 00:01:18 sidekiq_exporter
0 Z git 98 1 0 80 0 - 0 - 14:43 ? 00:00:00 [gpgconf] <defunct>
4 Z git 100 1 0 80 0 - 0 - 14:43 ? 00:00:00 [gpgconf] <defunct>
4 Z git 102 1 0 80 0 - 0 - 14:43 ? 00:00:00 [gpg] <defunct>
4 Z git 104 1 0 80 0 - 0 - 14:43 ? 00:00:00 [gpgsm] <defunct>
0 Z git 106 1 0 80 0 - 0 - 14:43 ? 00:00:00 [gpgconf] <defunct>
0 Z git 108 1 0 80 0 - 0 - 14:43 ? 00:00:00 [gpg] <defunct>
4 Z git 110 1 0 80 0 - 0 - 14:43 ? 00:00:00 [gpg] <defunct>
0 Z git 112 1 0 80 0 - 0 - 14:43 ? 00:00:00 [gpg] <defunct>
0 Z git 133 1 0 80 0 - 0 - 14:45 ? 00:00:00 [gpg] <defunct>
4 Z git 135 1 0 80 0 - 0 - 14:45 ? 00:00:00 [gpg] <defunct>
4 Z git 137 1 0 80 0 - 0 - 14:45 ? 00:00:00 [gpg] <defunct>
5 Z git 139 1 0 80 0 - 0 - 14:45 ? 00:00:00 [gpg-agent] <defunct>
0 Z git 142 1 0 80 0 - 0 - 14:45 ? 00:00:00 [gpg] <defunct>
4 Z git 144 1 0 80 0 - 0 - 14:45 ? 00:00:00 [gpg] <defunct>
5 Z git 146 1 0 80 0 - 0 - 14:45 ? 00:00:00 [gpg-agent] <defunct>
4 Z git 149 1 0 80 0 - 0 - 14:45 ? 00:00:00 [gpg] <defunct>
0 Z git 151 1 0 80 0 - 0 - 14:45 ? 00:00:00 [gpg] <defunct>
4 Z git 153 1 0 80 0 - 0 - 14:45 ? 00:00:00 [gpg] <defunct>
0 Z git 155 1 0 80 0 - 0 - 14:45 ? 00:00:00 [gpg] <defunct>
0 Z git 209 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg] <defunct>
4 Z git 211 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg] <defunct>
4 Z git 213 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg] <defunct>
5 Z git 215 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg-agent] <defunct>
0 Z git 218 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg] <defunct>
4 Z git 220 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg] <defunct>
5 Z git 222 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg-agent] <defunct>
4 Z git 225 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg] <defunct>
0 Z git 227 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg] <defunct>
4 Z git 229 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg] <defunct>
0 Z git 231 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg] <defunct>
0 Z git 233 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg] <defunct>
4 Z git 235 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg] <defunct>
4 Z git 237 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg] <defunct>
5 Z git 239 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg-agent] <defunct>
0 Z git 242 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg] <defunct>
4 Z git 244 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg] <defunct>
5 Z git 246 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg-agent] <defunct>
4 Z git 249 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg] <defunct>
0 Z git 251 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg] <defunct>
4 Z git 253 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg] <defunct>
0 Z git 255 1 0 80 0 - 0 - 14:49 ? 00:00:00 [gpg] <defunct>
0 Z git 311 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
4 Z git 313 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
4 Z git 315 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
5 Z git 317 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg-agent] <defunct>
0 Z git 320 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
4 Z git 322 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
5 Z git 324 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg-agent] <defunct>
4 Z git 327 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
0 Z git 329 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
4 Z git 331 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
0 Z git 333 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
0 Z git 335 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
4 Z git 337 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
4 Z git 339 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
5 Z git 341 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg-agent] <defunct>
0 Z git 344 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
4 Z git 346 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
5 Z git 348 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg-agent] <defunct>
4 Z git 351 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
0 Z git 353 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
4 Z git 355 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
0 Z git 357 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
0 Z git 359 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct> 4 Z git 361 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
4 Z git 363 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
5 Z git 365 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg-agent] <defunct>
0 Z git 368 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
4 Z git 370 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
5 Z git 372 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg-agent] <defunct>
4 Z git 375 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
0 Z git 377 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
4 Z git 379 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
0 Z git 382 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
0 Z git 386 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
4 Z git 388 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
4 Z git 390 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
5 Z git 392 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg-agent] <defunct>
0 Z git 395 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
4 Z git 397 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
5 Z git 399 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg-agent] <defunct>
4 Z git 402 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
0 Z git 404 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
4 Z git 406 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
0 Z git 408 1 0 80 0 - 0 - 14:53 ? 00:00:00 [gpg] <defunct>
0 Z git 423 1 0 80 0 - 0 - 14:54 ? 00:00:00 [gpg] <defunct>
4 Z git 425 1 0 80 0 - 0 - 14:54 ? 00:00:00 [gpg] <defunct>
4 Z git 427 1 0 80 0 - 0 - 14:54 ? 00:00:00 [gpg] <defunct>
5 Z git 429 1 0 80 0 - 0 - 14:54 ? 00:00:00 [gpg-agent] <defunct>
0 Z git 432 1 0 80 0 - 0 - 14:54 ? 00:00:00 [gpg] <defunct>
4 Z git 434 1 0 80 0 - 0 - 14:54 ? 00:00:00 [gpg] <defunct>
5 Z git 436 1 0 80 0 - 0 - 14:54 ? 00:00:00 [gpg-agent] <defunct>
4 Z git 439 1 0 80 0 - 0 - 14:54 ? 00:00:00 [gpg] <defunct>
0 Z git 441 1 0 80 0 - 0 - 14:54 ? 00:00:00 [gpg] <defunct>
4 Z git 443 1 0 80 0 - 0 - 14:54 ? 00:00:00 [gpg] <defunct>
0 Z git 445 1 0 80 0 - 0 - 14:54 ? 00:00:00 [gpg] <defunct>
0 Z git 955 1 0 80 0 - 0 - 15:30 ? 00:00:00 [gpg] <defunct>
4 Z git 957 1 0 80 0 - 0 - 15:30 ? 00:00:00 [gpg] <defunct>
4 Z git 959 1 0 80 0 - 0 - 15:30 ? 00:00:00 [gpg] <defunct>
5 Z git 961 1 0 80 0 - 0 - 15:30 ? 00:00:00 [gpg-agent] <defunct>
0 Z git 964 1 0 80 0 - 0 - 15:30 ? 00:00:00 [gpg] <defunct>
4 Z git 966 1 0 80 0 - 0 - 15:30 ? 00:00:00 [gpg] <defunct>
5 Z git 968 1 0 80 0 - 0 - 15:30 ? 00:00:00 [gpg-agent] <defunct>
4 Z git 971 1 0 80 0 - 0 - 15:30 ? 00:00:00 [gpg] <defunct>
0 Z git 973 1 0 80 0 - 0 - 15:30 ? 00:00:00 [gpg] <defunct>
4 Z git 975 1 0 80 0 - 0 - 15:30 ? 00:00:00 [gpg] <defunct>
0 Z git 977 1 0 80 0 - 0 - 15:30 ? 00:00:00 [gpg] <defunct>
0 Z git 1022 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg] <defunct>
4 Z git 1024 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg] <defunct>
4 Z git 1026 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg] <defunct>
5 Z git 1028 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg-agent] <defunct>
0 Z git 1031 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg] <defunct>
4 Z git 1033 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg] <defunct>
5 Z git 1035 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg-agent] <defunct>
4 Z git 1038 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg] <defunct>
0 Z git 1040 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg] <defunct>
4 Z git 1042 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg] <defunct>
0 Z git 1044 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg] <defunct>
0 Z git 1048 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg] <defunct>
4 Z git 1050 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg] <defunct>
4 Z git 1052 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg] <defunct>
5 Z git 1054 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg-agent] <defunct>
0 Z git 1057 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg] <defunct>
4 Z git 1059 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg] <defunct>
5 Z git 1061 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg-agent] <defunct>
4 Z git 1064 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg] <defunct>
0 Z git 1066 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg] <defunct>
4 Z git 1068 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg] <defunct>
0 Z git 1070 1 0 80 0 - 0 - 15:33 ? 00:00:00 [gpg] <defunct>
4 R git 1454 0 0 80 0 - 2146 - 16:00 ? 00:00:00 ps -efl
At the time of this writing catchall
is utilizing the following configuration for routing Sidekiq jobs:
- https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/031b9c78d41552a8de371bd0ab5f21a4f3370855/releases/gitlab/values/gprd.yaml.gotmpl#L699-712
- And sidekiq starts up like so:
sidekiq 6.4.0 queues:default,mailers,project_import_schedule [0 of 15 busy]
While defunct processes are normally safe to ignore, I ponder if sidekiq may be doing something wrong and we are potentially hiding errors or disregarding a subprocess that was spawned. This should be investigated to determine if this is indeed harmless, or if something bad is happening that we need to better manage. While it is common for us to run into occasional OOMKill events, which could lead to this, we do not trigger enough events to cause the high amount of defunct processes.
As an example, the above output is a single Pod. At the time I performed this investigation, we were running 188 Pods of this workload. There were 41,783 defunct processes on this single workload.
% grep -c PID output.txt
188
% grep -c defunct output.txt
41783