Remediate usage reporting following mid-April 2024 power outage

Problem / Opportunity Statement

Following a multiple-days-long power outage and network/Ceph sequelae, we want to bill Jetstream2 allocations fairly for usage, i.e., not charge for SUs consumed while the service was unusable or severely degraded.

I paused the Rundeck job schedule this morning, meaning we will report no further usage until we log into Rundeck and re-enable it.

image

Looking backward at what we have reported:

  • 2024-04-14 midnight UTC to 2024-04-15 midnight UTC, which is Saturday 8 PM EDT to Sunday 8 PM EDT.
  • 2024-04-12 midnight UTC to 2024-04-12 midnight UTC, which is Thurs 8 PM EDT to Fri 8 PM EDT.
  • Every day prior.

(It looks like there's a small bug in the logic. Every day just after midnight UTC, we intend to calculate and report usage for the previous day UTC, but we are actually calculating for a 1-day period starting 2 days in the past, meaning our usage reporting lags a day further behind than it needs to. @cmart to fix later.)

Resolution

So, when we've resolved the outage, we want to:

  • Report the missing usage for 2024-04-13 midnight UTC to 2024-04-14 midnight UTC
  • Consider issuing a credit for some/all of usage we billed 2024-04-14 midnight UTC to 2024-04-15 midnight UTC, which partially overlaps with the outage period.
  • Resume usage reporting starting at the first 24-hour period that doesn't overlap with the end of the outage.
  • Bring the daily time period for reporting up to the day ending just prior to when we calculate (not the time period starting 2 days prior and ending 1 day prior).