Skip to content

Draft: Add exploration of Geo-related Prometheus metrics to Geo bootcamp

During the recent Geo Deep Dive (August 2020), we learned about some Geo-specific Prometheus metrics. This MR proposes adding Prometheus-related tasks to the Geo bootcamp.

Overview of Proposed Tasks

  • Cause a synchronization failure and observe how select metrics change
  • Familiarize yourself with the available Geo-related metrics

Cause a synchronization failure and observe how select metrics change

This new section will ask those who are completing the bootcamp to:

  • Identify the values of the geo_repositories, geo_repositories_synced and geo_repositories_failed metrics
  • Intentionally cause a repo sync to fail
  • Identify the new values of the geo_repositories, geo_repositories_synced and geo_repositories_failed metrics
  • Fix the sync failure
  • Confirm that the values of geo_repositories, geo_repositories_synced and geo_repositories_failed metrics have returned to the expected values

This section will teach people completing the bootcamp about using Prometheus to understand the current state of Geo. This is also useful because customers are known to rely on this information.

Familiarize yourself with the list of Geo-related metrics

GitLab Metrics

There are a few dozen Geo-specific metrics. Look through the list and familiarize yourself with them. Observe the values of these metrics in the Prometheus instance on your GitLab instance.

  • geo_attachments
  • geo_attachments_failed
  • geo_attachments_synced
  • geo_attachments_synced_missing_on_primary
  • geo_cursor_last_event_id
  • geo_cursor_last_event_timestamp
  • geo_db_replication_lag_seconds
  • geo_job_artifacts_synced_missing_on_primary
  • geo_last_event_id
  • geo_last_event_timestamp
  • geo_last_successful_status_check_timestamp
  • geo_lfs_objects
  • geo_lfs_objects_failed
  • geo_lfs_objects_synced
  • geo_lfs_objects_synced_missing_on_primary
  • geo_repositories
  • geo_repositories_checked_count
  • geo_repositories_checked_failed_count
  • geo_repositories_checksum_failed_count
  • geo_repositories_checksum_mismatch_count
  • geo_repositories_checksummed_count
  • geo_repositories_failed
  • geo_repositories_retrying_verification_count
  • geo_repositories_synced
  • geo_repositories_verification_failed_count
  • geo_repositories_verified_count
  • geo_status_failed_total
  • geo_wikis_checksum_failed_count
  • geo_wikis_checksum_mismatch_count
  • geo_wikis_checksummed_count
  • geo_wikis_retrying_verification_count
  • geo_wikis_verification_failed_count
  • geo_wikis_verified_count

Action Items

  • How do I make a sync fail? Will chown root:root do the trick?
  • Assemble an MR for this