Skip to content

Gracefully track errors raised by sending CI minutes notifications

Fabio Pitino requested to merge rescue-ci-minutes-email-notification-errors into master

What does this MR do and why?

Related to #331785 (closed)

Also related to https://gitlab.com/gitlab-org/gitlab/-/issues/335885 (but this represents a quick fix for now)

While testing the fix for #331785 (closed) I noticed that a Sidekiq job was failing due to a missing project record in Ci::Minutes::EmailNotificationService, causing the job to be retried numerous times with backoff. Since the error was consistent, the backoff + the number of retries caused the job to keep retrying after 12 hours (which is the TTL used for idempotency). This means that after 12 hours UpdateProjectAndNamespaceUsageService loses the idempotency.

Given that UpdateProjectAndNamespaceUsageService is required to be fully idempotent, we want it to be as lean as possible. As a quick fix I've wrapped the method responsible for calculating and sending the email notification to the user with a rescue block and tracking the exception in Sentry so we don't lose visibility.

As this is also related to https://gitlab.com/gitlab-org/gitlab/-/issues/335885#note_696254385, I've left a TODO linked to that issue so we can extract the email notification into a dedicated worker, which doesn't impact the idempotency of UpdateProjectAndNamespaceUsageService.

Screenshots or screen recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Fabio Pitino

Merge request reports