Release Monitoring for Usage Billing - Group Provision

Release Monitoring for Usage Billing - Group Provision

Project Objective

This issue is for monitoring the release of Usage Billing for Group Provision, following the successful UAT process completed in #14494.

Pre-requisites

  • Successful completion of Provision UAT for Usage Billing
  • All critical bugs identified during UAT have been resolved
  • Feature flags are properly configured for production

Monitoring Plan

Because this is a new feature with complex interactions between multiple systems, we need to closely monitor the behavior in production to ensure everything is working as expected.

  1. CustomersDot Production - Sentry
  2. GCP logging - Production
  3. GCP logging - Staging
  4. Kibana logging

Key Monitoring Areas

1. Consumer Creation and Wallet Management

GCP Query for Consumer Creation:

Consumer creation failurs: GCP log

severity:"ERROR"
jsonPayload.class:"Api::V1::Consumers::ResolveController"

AIGW request gets rejected: ([403] HEAD /api/v1/consumers/resolve)

jsonPayload.controller:"Api::V1::Consumers::ResolveController"
jsonPayload.status:403

Rails Console Verification:

To verify consumer creation and wallet management, teleport to the CustomersDot Production Rails console:

# For GitLab.com consumer
consumer = Consumer.find_by(entity_id: <user_id>, subscription_name: <subscription_name>, gl_namespace_id: <root_namespace_id>)

# For Self-managed consumer
consumer = Consumer.find_by(entity_id: <user_id>, subscription_name: <subscription_name>, self_managed_instance_activation: <self_managed_instance_activation>)

# Check wallet balance and transactions
consumer.wallet.balance.to_i
consumer.wallet.transactions

CustomersDot Admin UI:

2. Monthly Commitments

Rails Console Verification:

# Find subscription
sub = Zuora::Local::Subscription.find_by(name: <subscription_name>)

# Check subscription wallet and transactions
sub.wallet.balance.to_i
sub.wallet.transactions

CustomersDot Admin UI:

3. Monthly Waivers

Rails Console Verification:

# Find subscription
sub = Zuora::Local::Subscription.find_by(name: <subscription_name>)

# Find the subscription's otc wallet and transactions
otc_wallet = sub.wallets.otc.first
otc_wallet.balance.to_i
otc_wallet.transactions

CustomersDot Admin UI:

4. Monthly Resets

GCP Query for Monthly Resets:

jsonPayload.class: "Billing::Usage::MonthlyCreditResetService"
severity:"Error"

Rails Console Verification:

To manually trigger a monthly reset for testing:

# Find subscription
sub = Zuora::Local::Subscription.find_by(name: <subscription_name>)

# Trigger reset job
Billing::MonthlyCreditResetJob.perform_now(sub.name)

5. DAP Events and Quota Checks

Debugging Usage Quota Checks:

If you need to debug usage quota checks, remember that the AI Gateway caches requests for the usage cut-offs workflow for 1 hour if they share the same event context. To bypass the cache, try sending requests with different contexts (ex: from a new namespace, user, etc.).

Troubleshooting Common Issues

1. Consumer Not Created as Expected

Potential causes:

  • Cache issues with the AI Gateway
  • Subscription status issues (expired, grace period, etc.)
  • Missing or incorrect parameters in the DAP event

Verification steps:

  1. Check if the subscription is active and not expired
  2. Verify the DAP event contains all required parameters
  3. Try triggering the event from a different context to bypass cache

2. Credits Not Allocated Correctly

Potential causes:

  • Issues with the Zuora callout
  • Problems with the wallet creation or transaction recording

Verification steps:

  1. Check the subscription details in Zuora
  2. Verify wallet creation and transaction records
  3. Check for any errors in the GCP logs related to credit allocation

3. Monthly Reset Issues

Potential causes:

  • Existing non-expired wallet transactions preventing reset
  • Subscription status issues

Verification steps:

  1. Check if there are any non-expired wallet transactions
  2. Verify the subscription status
  3. Check for any errors in the GCP logs related to monthly resets

Monitoring Schedule

Date (in UTC) Note Available

Issues Identified During Monitoring

Issue Description Resolution MR DRI

[Issue link]

Description of the issue Resolution steps or status

[MR link if applicable]

@username

Edited by 🤖 GitLab Bot 🤖