Gitter Email notifications broken since 2020-05-18

This is a screenshot from Mandrill dashboard

Screenshot_2020-06-01_at_12.01.18_PM

Screenshot_2020-06-01_at_12.07.58_PM

It seems that our notifications stopped coming out on 2020-05-18 at 11:00 UTC

Cause of the outage

None of the webapp servers had Group_primary-email-notification-server or Group_secondary-email-notification-server

So the ansible task to add a cron job for notifications didn't run.

- set_fact:
    notification_email_schedule: "0,10,20,30,40,50"
  when: "'primary-email-notification-server' in group_names"

- set_fact:
    notification_email_schedule: "5,15,25,35,45,55"
  when: "'secondary-email-notification-server' in group_names"

Why there wasn't any tagged notification server?

This code is in user-data.sh for every webapp instance:

# Ensure there is at least one instance that is sending out emails
# Looks for any existing instances with the `Group_primary-email-notification-server` tag and if not, adds to the current new server
# This probably a little problematic. If two servers startup at the same time, then potentially both could get the tag
does_primary_notification_email_server_exist=$(aws --region "$region" ec2 describe-instances --filters "Name=tag:Group_primary-email-notification-server,Values=" "Name=tag:aws:autoscaling:groupName,Values=$as_name" | jq '.Reservations[]')
does_secondary_notification_email_server_exist=$(aws --region "$region" ec2 describe-instances --filters "Name=tag:Group_secondary-email-notification-server,Values=" "Name=tag:aws:autoscaling:groupName,Values=$as_name" | jq '.Reservations[]')
if [ -z "$does_primary_notification_email_server_exist" ]; then
  aws --region "$region" ec2 create-tags --resources "$instance_id" --tags Key=Group_primary-email-notification-server,Value=
elif [ -z "$does_secondary_notification_email_server_exist" ]; then
  aws --region "$region" ec2 create-tags --resources "$instance_id" --tags Key=Group_secondary-email-notification-server,Value=
fi

The comment already says that it could be a little problematic, but it gets much worse. This code will falsely report existing EC2 instance with the email-notification-server tag if the instance hasn't been terminated for too long.

You can see the following log from webapp-01 which found terminated webapp-03 as a primary-email-notifiaction-server Unfortunately, it found webapp-04 as secondary and so webapp-01 didn't initialize itself as email server.

aws --region us-east-1 ec2 describe-instances --filters Name=tag:Group_primary-email-notification-server,Values= Name=tag:aws:autoscaling:groupName,Values=webapp-servers
+ does_primary_notification_email_server_exist='{
  "Groups": [],
  "Instances": [
    {
      "StateReason": {
        "Code": "Client.UserInitiatedShutdown",
        "Message": "Client.UserInitiatedShutdown: User initiated shutdown"
      },
      "Tags": [
        {
          "Value": "prod",
          "Key": "Env"
        },
        {
          "Value": "",
          "Key": "Group_webapp-servers"
        },
        {
          "Value": "webapp-servers",
          "Key": "aws:autoscaling:groupName"
        },
        {
          "Value": "webapp-03",
          "Key": "Name"
        },
        {
          "Value": "",
          "Key": "Group_primary-email-notification-server"
        },
      ],
      "State": {
        "Code": 48,
        "Name": "terminated"
      },
      "StateTransitionReason": "User initiated (2020-05-18 11:36:09 GMT)",
    }
  ]
}'

I remember terminating the last 4 instances at once because it was early morning in Europe and I knew that the remaining 4 instances will easily handle the load for 5 minutes until the new instances boot up.

Original issue

https://gitlab.com/gitlab-org/gitter/webapp/-/issues/2532

Discovered by @@jeremyVignelles

I don't get email notifications anymore

I'm really disappointed with the gitter's notification system. I have the android app, but I never managed to make it send me any notification. That's beyond the point here, and I managed to survive for two years with only e-mail notifications, sent 1 hour after the message.

Now, a few weeks ago, gitter suddenly stopped sending me e-mail notifications, but the "all notifications" checkbox on the room's settings is still checked.

More info:

  • I'm logging in with my GitHub account
  • I'm using gitter on 3 different browser, but never at the same time.
  • I have the app installed, but I'm never using it
  • Several messages, on different channels were sent during the weekend, and I didn't get any notification.
Edited by Tomas Vik (OOO back on 2026-04-07)