Skip to content

Draft: Catch and return message for status page AWS error

Sarah Yasonik requested to merge sy-fix-uncaught-aws-error into master

What does this MR do and why?

Handles client errors like Aws::S3::Errors::InvalidAccessKeyId when publishing content to Status Page, so the error gets appropriately logged (rather than contributing to grouprespond's error budget).

How to set up and validate locally

  1. With developer+ permissions, create an incident issue
    • Monitor > Incidents > Create incident (UI) or via rails console
  2. Find the incident in the console
    incident = Issue.incident.last
  3. Set up a faux Status Page setting
    • In the rails console, run:
      ::StatusPage::ProjectSetting.create!(
        project_id: incident.project_id, 
        enabled: true,
        aws_s3_bucket_name: "bucket-name",
        aws_region: "us-east-1",
        aws_access_key: "FFFFFFFFFFFFFFFFFFFF",
        aws_secret_key: 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa',
        status_page_url: "https://status.gitlab.com"
      )
    • we validate the bucket details on save, but users could always delete the AWS S3 bucket later & we wouldn't know to invalidate the record on our side. Creating a valid record with no real corresponding S3 bucket mimics that scenario.
  4. Run the worker for the incident/project & your user
    • Run:
      ::StatusPage::PublishWorker.new.perform(User.first, incident.project_id, incident.id)
    • If you have issues with IncidentCounter load errors, run load 'ee/lib/gitlab/status_page/usage_data_counters/incident_counter.rb' in the console before running the worker. I double-checked, and we are still recording usage data for publishing/unpublishing incidents, so I'm not entirely sure what's going on with that. But it's the only usage data counter which isn't in GitLab::UsageDataCounters module, so I'm guessing it's something to do with that/load order/etc.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Sarah Yasonik

Merge request reports