Gitter Email notifications broken since 2020-05-18
This is a screenshot from Mandrill dashboard
It seems that our notifications stopped coming out on 2020-05-18 at 11:00 UTC
Cause of the outage
None of the webapp servers had Group_primary-email-notification-server or Group_secondary-email-notification-server
So the ansible task to add a cron job for notifications didn't run.
- set_fact:
notification_email_schedule: "0,10,20,30,40,50"
when: "'primary-email-notification-server' in group_names"
- set_fact:
notification_email_schedule: "5,15,25,35,45,55"
when: "'secondary-email-notification-server' in group_names"
Why there wasn't any tagged notification server?
This code is in user-data.sh for every webapp instance:
# Ensure there is at least one instance that is sending out emails
# Looks for any existing instances with the `Group_primary-email-notification-server` tag and if not, adds to the current new server
# This probably a little problematic. If two servers startup at the same time, then potentially both could get the tag
does_primary_notification_email_server_exist=$(aws --region "$region" ec2 describe-instances --filters "Name=tag:Group_primary-email-notification-server,Values=" "Name=tag:aws:autoscaling:groupName,Values=$as_name" | jq '.Reservations[]')
does_secondary_notification_email_server_exist=$(aws --region "$region" ec2 describe-instances --filters "Name=tag:Group_secondary-email-notification-server,Values=" "Name=tag:aws:autoscaling:groupName,Values=$as_name" | jq '.Reservations[]')
if [ -z "$does_primary_notification_email_server_exist" ]; then
aws --region "$region" ec2 create-tags --resources "$instance_id" --tags Key=Group_primary-email-notification-server,Value=
elif [ -z "$does_secondary_notification_email_server_exist" ]; then
aws --region "$region" ec2 create-tags --resources "$instance_id" --tags Key=Group_secondary-email-notification-server,Value=
fi
The comment already says that it could be a little problematic, but it gets much worse. This code will falsely report existing EC2 instance with the email-notification-server tag if the instance hasn't been terminated for too long.
You can see the following log from webapp-01 which found terminated webapp-03 as a primary-email-notifiaction-server Unfortunately, it found webapp-04 as secondary and so webapp-01 didn't initialize itself as email server.
aws --region us-east-1 ec2 describe-instances --filters Name=tag:Group_primary-email-notification-server,Values= Name=tag:aws:autoscaling:groupName,Values=webapp-servers
+ does_primary_notification_email_server_exist='{
"Groups": [],
"Instances": [
{
"StateReason": {
"Code": "Client.UserInitiatedShutdown",
"Message": "Client.UserInitiatedShutdown: User initiated shutdown"
},
"Tags": [
{
"Value": "prod",
"Key": "Env"
},
{
"Value": "",
"Key": "Group_webapp-servers"
},
{
"Value": "webapp-servers",
"Key": "aws:autoscaling:groupName"
},
{
"Value": "webapp-03",
"Key": "Name"
},
{
"Value": "",
"Key": "Group_primary-email-notification-server"
},
],
"State": {
"Code": 48,
"Name": "terminated"
},
"StateTransitionReason": "User initiated (2020-05-18 11:36:09 GMT)",
}
]
}'
I remember terminating the last 4 instances at once because it was early morning in Europe and I knew that the remaining 4 instances will easily handle the load for 5 minutes until the new instances boot up.
Original issue
https://gitlab.com/gitlab-org/gitter/webapp/-/issues/2532
Discovered by @@jeremyVignelles
I don't get email notifications anymore
I'm really disappointed with the gitter's notification system. I have the android app, but I never managed to make it send me any notification. That's beyond the point here, and I managed to survive for two years with only e-mail notifications, sent 1 hour after the message.
Now, a few weeks ago, gitter suddenly stopped sending me e-mail notifications, but the "all notifications" checkbox on the room's settings is still checked.
More info:
- I'm logging in with my GitHub account
- I'm using gitter on 3 different browser, but never at the same time.
- I have the app installed, but I'm never using it
- Several messages, on different channels were sent during the weekend, and I didn't get any notification.

