Draft: Add exploration of Geo-related Prometheus metrics to Geo bootcamp
During the recent Geo Deep Dive (August 2020), we learned about some Geo-specific Prometheus metrics. This MR proposes adding Prometheus-related tasks to the Geo bootcamp.
Overview of Proposed Tasks
- Cause a synchronization failure and observe how select metrics change
- Familiarize yourself with the available Geo-related metrics
Cause a synchronization failure and observe how select metrics change
This new section will ask those who are completing the bootcamp to:
- Identify the values of the
geo_repositories
,geo_repositories_synced
andgeo_repositories_failed
metrics - Intentionally cause a repo sync to fail
- Identify the new values of the
geo_repositories
,geo_repositories_synced
andgeo_repositories_failed
metrics - Fix the sync failure
- Confirm that the values of
geo_repositories
,geo_repositories_synced
andgeo_repositories_failed
metrics have returned to the expected values
This section will teach people completing the bootcamp about using Prometheus to understand the current state of Geo. This is also useful because customers are known to rely on this information.
Familiarize yourself with the list of Geo-related metrics
There are a few dozen Geo-specific metrics. Look through the list and familiarize yourself with them. Observe the values of these metrics in the Prometheus instance on your GitLab instance.
geo_attachments
geo_attachments_failed
geo_attachments_synced
geo_attachments_synced_missing_on_primary
geo_cursor_last_event_id
geo_cursor_last_event_timestamp
geo_db_replication_lag_seconds
geo_job_artifacts_synced_missing_on_primary
geo_last_event_id
geo_last_event_timestamp
geo_last_successful_status_check_timestamp
geo_lfs_objects
geo_lfs_objects_failed
geo_lfs_objects_synced
geo_lfs_objects_synced_missing_on_primary
geo_repositories
geo_repositories_checked_count
geo_repositories_checked_failed_count
geo_repositories_checksum_failed_count
geo_repositories_checksum_mismatch_count
geo_repositories_checksummed_count
geo_repositories_failed
geo_repositories_retrying_verification_count
geo_repositories_synced
geo_repositories_verification_failed_count
geo_repositories_verified_count
geo_status_failed_total
geo_wikis_checksum_failed_count
geo_wikis_checksum_mismatch_count
geo_wikis_checksummed_count
geo_wikis_retrying_verification_count
geo_wikis_verification_failed_count
geo_wikis_verified_count
Action Items
-
How do I make a sync fail? Will chown root:root
do the trick? -
Assemble an MR for this