Skip to content

feat: add metrics for number of total and applied migrations

What does this MR do?

This MR adds Prometheus metrics to track the total number of database migrations in the container registry. It introduces two new gauge metrics:

  • registry_database_total_migrations{migration_type="pre_deployment"} - Total count of pre-deployment migrations
  • registry_database_total_migrations{migration_type="post_deployment"} - Total count of post-deployment migrations

The "total" metrics described above are updated once on startup of the app with the registry serve command. The total number of migrations can only change after a deployment.

Additionally, it extends the existing row count collector to track applied migrations:

  • registry_database_rows{query_name="applied_pre_migrations"} - Number of applied pre-deployment migrations
  • registry_database_rows{query_name="applied_post_migrations"} - Number of applied post-deployment migrations

Rowcount metrics are updated every 10 seconds by default, so the metrics counting the number of applied migrations will be updated at the same frequency.

The "total" metric and the "applied" metric can be used to calculate the number of pending migrations as total - applied.

These metrics will help monitor migration status and provide visibility into the migration state across different environments, supporting the infrastructure delivery requirements outlined in the related issue.

Key changes:

  • New migration_count.go file with "total migration count" metrics
  • Extended row count collector to include "applied migration count" queries
  • Integration in app initialization to set the "total" metric on startup
  • Comprehensive test coverage for new functionality

Related to gitlab-com/gl-infra/delivery#21256 (closed).

Testing

Ran the container registry locally, with a database, to see if the total count metrics are set correctly. The last line in the following logs shows that they were. Also verified by checking the /metrics endpoint.

➜  container-registry git:(rp-add-migration-metrics) ✗ ./bin/registry serve config.yml     
WARN[0000] No HTTP secret provided - generated random secret. This may cause problems with uploads if multiple registries are behind a load-balancer. To provide a shared secret, fill in http.secret in the configuration file or set the REGISTRY_HTTP_SECRET environment variable.  go_version=go1.23.7 instance_id=1752ac84-af3c-4944-96a5-a59e2e0ce6ea service=registry version=v4.27.0-gitlab-49-g0fc3844c0
WARN[0000] rate-limiter is disabled                      go_version=go1.23.7 instance_id=1752ac84-af3c-4944-96a5-a59e2e0ce6ea service=registry version=v4.27.0-gitlab-49-g0fc3844c0
INFO[0000] storage backend redirection enabled           go_version=go1.23.7 instance_id=1752ac84-af3c-4944-96a5-a59e2e0ce6ea service=registry version=v4.27.0-gitlab-49-g0fc3844c0
INFO[0000] using the metadata database                   go_version=go1.23.7 instance_id=1752ac84-af3c-4944-96a5-a59e2e0ce6ea service=registry version=v4.27.0-gitlab-49-g0fc3844c0
INFO[0000] Starting upload purge in 32m0s                go_version=go1.23.7 instance_id=1752ac84-af3c-4944-96a5-a59e2e0ce6ea service=registry version=v4.27.0-gitlab-49-g0fc3844c0
INFO[0000] setting total migration count metrics         go_version=go1.23.7 version=v4.27.0-gitlab-49-g0fc3844c0
INFO[0000] total migration count metrics set             go_version=go1.23.7 total_post_migrations=5 total_pre_migrations=178 version=v4.27.0-gitlab-49-g0fc3844c0

Author checklist

  • Assign one of conventional-commit prefixes to the MR.
    • fix: Indicates a bug fix, triggers a patch release.
    • feat: Signals the introduction of a new feature, triggers a minor release.
    • perf: Focuses on performance improvements that don't introduce new features or fix bugs, triggers a patch release.
    • docs: Updates or changes to documentation. Does not trigger a release.
    • style: Changes that do not affect the code's functionality. Does not trigger a release.
    • refactor: Modifications to the code that do not fix bugs or add features but improve code structure or readability. Does not trigger a release.
    • test: Changes related to adding or modifying tests. Does not trigger a release.
    • chore: Routine tasks that don't affect the application, such as updating build processes, package manager configs, etc. Does not trigger a release.
    • build: Changes that affect the build system or external dependencies. May trigger a release.
    • ci: Modifications to continuous integration configuration files and scripts. Does not trigger a release.
    • revert: Reverts a previous commit. It could result in a patch, minor, or major release.
  • Feature flags
    • This change does not require a feature flag
    • Added feature flag: ( Add the Feature flag tracking issue link here )
  • Unit-tests
    • Unit-tests are not required
    • I added unit tests
  • Documentation:
  • database changes including schema/background migrations:
    • Change does not introduce database changes
    • MR includes DB changes
      • Do not include code that depends on the schema migrations in the same commit. Split the MR into two or more.
      • Do not include code that depends on background migrations in the same release.
      • Manually run up and down migrations in a postgres.ai production database clone and post a screenshot of the result here.
      • If adding new schema migrations make sure the REGISTRY_SELF_MANAGED_RELEASE_VERSION CI variable in migrate.yml is pointing to the latest GitLab self-managed released registry version. Find the correct registry version here. Make sure to select the branch of the latest GitLab release.
      • If adding new queries, extract a query plan from postgres.ai and post the link here. If changing existing queries, also extract a query plan for the current version for comparison.
        • I do not have access to postgres.ai and have made a comment on this MR asking for these to be run on my behalf.
      • If adding new background migration, follow the guide for performance testing new background migrations and add a report/summary to the MR with your analysis.
  • Ensured this change is safe to deploy to individual stages in the same environment (cny -> prod). State-related changes can be troublesome due to having parts of the fleet processing (possibly related) requests in different ways.
  • If the change contains a breaking change, apply the breaking change label.
  • If the change is considered high risk, apply the label high-risk-change
  • Changes cannot be rolled back
    • Change can be safely rolled back
    • Change can't be safely rolled back
      • Apply the label cannot-rollback.
      • Add a section to the MR description that includes the following details:
        • The reasoning behind why a release containing the presented MR can not be rolled back (e.g. schema migrations or changes to the FS structure)
        • Detailed steps to revert/disable a feature introduced by the same change where a migration cannot be rolled back. (note: ideally MRs containing schema migrations should not contain feature changes.)
        • Ensure this MR does not add code that depends on these changes that cannot be rolled back.
Documentation/resources

Code review guidelines

Go Style guidelines

Reviewer checklist

  • Ensure the commit and MR title are still accurate.
  • If the change contains a breaking change, verify the breaking change label.
  • If the change is considered high risk, verify the label high-risk-change
  • Identify if the change can be rolled back safely. (note: all other reasons for not being able to rollback will be sufficiently captured by major version changes).
Edited by João Pereira

Merge request reports

Loading