Potential bug with mailroom Pod
Summary
When ill configured the mailroom process does not appear to run, though the Pod remains in a state that would indicate otherwise. Due to this situation, I worry that we do not have the appropriate monitoring from our scripting to report failures to Kubernetes.
Steps to reproduce
Provide a minimal configuration to the mailroom components. Purposely, example, setting a hostname that may be valid, but does not respond on the IMAP TCP port.
Observe that the Pod is in a ready state:
NAME READY STATUS RESTARTS AGE
a-mailroom-7c996fd78c-28sd9 1/1 Running 0 92m
Observe that that ruby process responsible for mailroom is not running:
% kubectl exec -it a-mailroom-7c996fd78c-28sd9 /bin/sh
$ ps -efl
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
4 S git 1 0 0 80 0 - 1070 wait 17:33 ? 00:00:00 /bin/sh -c /scripts/process-wrapper
4 S git 7 1 0 80 0 - 4933 wait 17:33 ? 00:00:00 /bin/bash /scripts/process-wrapper
0 S git 9 7 0 80 0 - 1493 hrtime 17:33 ? 00:00:00 tail -f /var/log/gitlab/mail_room.log
4 S git 5336 0 0 80 0 - 1070 wait 19:01 pts/0 00:00:00 /bin/sh
0 R git 5341 5336 0 80 0 - 9596 - 19:01 pts/0 00:00:00 ps -efl
Observe the logs where we see the obvious connection failure scenario:
% k logs a-mailroom-7c996fd78c-28sd9
+ /scripts/set-config /etc /etc
+ exec /bin/sh -c /scripts/process-wrapper
Begin parsing .erb files from /etc
Starting Mailroom
/usr/lib/ruby/2.6.0/net/imap.rb:1136:in `rescue in tcp_socket': Timeout to open TCP connection to 172.16.1.1.xip.io:993 (exceeds 30 seconds) (Net::OpenTimeout)
from /usr/lib/ruby/2.6.0/net/imap.rb:1131:in `tcp_socket'
from /usr/lib/ruby/2.6.0/net/imap.rb:1089:in `initialize'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/connection.rb:74:in `new'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/connection.rb:74:in `imap'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/connection.rb:84:in `log_in'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/connection.rb:68:in `setup'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/connection.rb:8:in `initialize'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/mailbox_watcher.rb:57:in `new'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/mailbox_watcher.rb:57:in `connection'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/mailbox_watcher.rb:28:in `run'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/coordinator.rb:19:in `each'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/coordinator.rb:19:in `run'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/cli.rb:52:in `start'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/bin/mail_room:5:in `<top (required)>'
from /usr/bin/mail_room:23:in `load'
from /usr/bin/mail_room:23:in `<main>'
/usr/lib/ruby/2.6.0/socket.rb:61:in `connect_internal': Connection timed out - user specified timeout (Errno::ETIMEDOUT)
from /usr/lib/ruby/2.6.0/socket.rb:137:in `connect'
from /usr/lib/ruby/2.6.0/socket.rb:641:in `block in tcp'
from /usr/lib/ruby/2.6.0/socket.rb:227:in `each'
from /usr/lib/ruby/2.6.0/socket.rb:227:in `foreach'
from /usr/lib/ruby/2.6.0/socket.rb:631:in `tcp'
from /usr/lib/ruby/2.6.0/net/imap.rb:1132:in `tcp_socket'
from /usr/lib/ruby/2.6.0/net/imap.rb:1089:in `initialize'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/connection.rb:74:in `new'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/connection.rb:74:in `imap'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/connection.rb:84:in `log_in'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/connection.rb:68:in `setup'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/connection.rb:8:in `initialize'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/mailbox_watcher.rb:57:in `new'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/mailbox_watcher.rb:57:in `connection'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/mailbox_watcher.rb:28:in `run'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/coordinator.rb:19:in `each'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/coordinator.rb:19:in `run'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/lib/mail_room/cli.rb:52:in `start'
from /usr/lib/ruby/gems/2.6.0/gems/mail_room-0.9.1/bin/mail_room:5:in `<top (required)>'
from /usr/bin/mail_room:23:in `load'
from /usr/bin/mail_room:23:in `<main>'
Configuration used
incomingEmail:
enabled: false
address: ""
host: "172.16.1.1.xip.io"
port: 993
ssl: true
startTls: false
user: ""
password:
secret: "imap-creds"
key: password
mailbox: inbox
idleTimeout: 60
Current behavior
The Pod is not crashing
Expected behavior
The Pod should crash. As noted in a working mailroom configuration we have a running ruby process:
% k exec -it b-mailroom-5f87b4695-98bjs /bin/sh
$ ps -efl
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
4 S git 1 0 0 80 0 - 1070 wait 16:51 ? 00:00:00 /bin/sh -c /scripts/process-wrapper
4 S git 7 1 0 80 0 - 4930 wait 16:51 ? 00:00:00 /bin/bash /scripts/process-wrapper
4 S git 8 7 0 80 0 - 63823 poll_s 16:51 ? 00:00:01 /usr/bin/ruby /usr/bin/mail_room -c /var/opt
0 S git 9 7 0 80 0 - 1493 hrtime 16:51 ? 00:00:00 tail -f /var/log/gitlab/mail_room.log
4 S git 8502 0 0 80 0 - 1070 wait 19:12 pts/0 00:00:00 /bin/sh
0 R git 8507 8502 0 80 0 - 9596 - 19:12 pts/0 00:00:00 ps -efl
Versions
- Chart:
9df0f92e27b3106c2fb4c41da8c6357fdc8d02bd
- Platform:
- Cloud: GKE
- Kubernetes:
- Client:
1.14.7
- Server:
1.14.7
- Client:
- Helm:
- Client:
2.14.2
- Server:
n/a
- Client:
Potential Root Cause
Our wrapper script https://gitlab.com/gitlab-org/build/CNG/blob/0a9e17d9308d0c8056f0cc212097613544b23b4d/gitlab-mailroom/scripts/process-wrapper#L8 performs a tail which will never exit, and we never reach our wait
command. So when the ruby process dies the wrapper script is still running due to the tail
. Our healthchecks are also not valid: https://gitlab.com/gitlab-org/charts/gitlab/blob/master/charts/gitlab/charts/mailroom/templates/deployment.yaml#L86-97. The use of --full
in pgrep
will look for anything that contains mail_room
in it, which the tail
command matches. So Kubernetes never finds out that the ruby process responsible for Mailroom can sometimes die.