emails for batch requests were not processed for a while
This issue is written as a log of an incident for later reference.
On November 7, 2021, a few pro users received the first daily update in about 2-3 months. When checking the relevant requests, it appeared that many of them had received email updated in that period of time, but somehow this had not triggered any event in alaveteli, nor email notifications to them.
After investigation, we found that:
- emails were correctly received by postfix (postfix logs show reception of said emails with correct timestamps)
- between august 13 and nov 7, no
info_request_events
were created for batch requests - during that same period, non-batch requests were processed normally
- we have no logs in syslog to understand what happened during that time (logrotate ate them), in particular, we don't know what happened with dovecot
- on nov 7, I upgraded the server and rebooted it, after which a series of about 200 info request events were created in a a few minutes.
- postfix logs show the same warning on all incoming batch emails:
(delivered via alaveteli service (
/is not writable. Bundler will use
/tmp/bundler/home/unknown' as your home directory temporarily))` this does not seem to have affected postfix, but somehow might have caused an error down the line
It looks like something was not working for that period of time, but we had no notification of it
action points:
- understand why/where batch vs non-batch requests are treated differently
- setup monitoring on the various steps of the process to be notified if something fails