Skip to content

Use repository recent objects size for project statistics

Background

As part of Start using RepositoryInfo for repository size ... (#402988 - closed), a new Gitaly RPC was exposed to supply a more granular set of repository size information (#418243 (comment 1489947936)).

Gitlab::Git::Repository#size is currently used for project statistics, and returns the complete repository size reported by either RepositorySize or RepositoryInfo depending on a feature flag (#418243 (closed)).

Initially, we thought we didn't need to make any further changes (#402988 (comment 1465486583)) to utilise the new size for usage quotas/billing, but a recent discussion has shown that not to be the case (#418243 (comment 1489947936)).

RepositorySize

RepositorySize is the old RPC, it includes unreachable objects that take a while to be cleaned up, so it can lead to a frustrating experience for customers who clean up storage but don't immediately see the result in GitLab.

The size returned is in kilobytes.

RepositoryInfo

The new RPC which returns a more complex set of information, breaking down the repository size for different contexts.

For project statistics and billing, the recent objects size is what we'd want to use from this new RPC (#418243 (comment 1489947936)).

The size returned from this is in bytes.

Proposal

In order to use a more useful repository size for our customers, that does not include unreachable objects and bypasses the need to wait for housekeeping scheduled tasks etc, we should:

  1. expose the recent size e.g.
      # lib/gitlab/git/repository.rb
      def recent_size
        gitaly_repository_client.repository_info.objects.recent_size
      end
  2. change project_statistics.update_repository_size to use project.repository.recent_size
  3. decide if we need to backfill (and handle in a separate issue perhaps)
Edited by Vijay Hawoldar