Skip to content

Use repository recent objects size for stats

Vijay Hawoldar requested to merge vij-use-recent-objects-size into master

What does this MR do and why?

Use repository recent objects size for project statistics.

Project statistics for repository storage size can be misleading, as we currently use the full disk size of the repository when calling project.repository.size.

That means the size often includes objects that are out of reach for a user, and require waiting for housekeeping tasks to be completed to see accurate sizes being reported.

As part of #402988 (closed), a new RPC was made available, RepositoryInfo, which gives a more granular breakdown of a repository's size, in particular, exposing the recent object size, which is what we should be using for project statistics repository size - confirmation here: #418243 (comment 1490026720)

To start using the new size, this MR:

  1. adds a method for returning the recent objects size, to the raw repository (Gitlab::Git::Repository) and the model (Repository)
  2. adds a feature flag, recent_objects_for_project_statistics
  3. conditionally uses the new size for project_statistics#repository_size, depending on the status of the new feature flag

Refs #419903 (closed)

How to set up and validate locally

Basic statistics refresh:

  1. In rails console enable the feature flag
    Feature.enable(:recent_objects_for_project_statistics)
  2. Trigger a refresh of a project statistics record:
    statistics = ProjectStatistics.last
    
    statistics.refresh!

Recent objects (new) vs total disk size (old) test:

  1. With the feature flag disabled, create a project and upload a file that consume a reasonable storage size (e.g. 2MiB) You can generate a file in MacOS with:
      dd if=/dev/urandom bs=2M count=1 of=2_mib_file_name
  2. Navigate to the usage quotas page for the project (/your-group/your-project/-/usage_quotas#storage-quota-tab)
  3. Hit Recalculate repository usage
  4. Refresh the usage quotas page and you should see the 2MiB storage usage
  5. Confirm repository size stored in project statistics:
      project = Project.find(id-of-your-project)
      project.statistics.repository_size
      # Value should be something like 2097152 (depending on your project/files added previously)
  6. Clone the project and delete the file you previously uploaded
      git clone ssh://git@gdk.test:2222/your-group/your-project.git && cd your-project
    
      rm 2_mib_file_name
    
      git add .; git commit -m 'remove the file'; git push
  7. Rewrite the repo history to remove the file entirely, as described in these docs: https://docs.gitlab.com/ee/user/project/repository/reducing_the_repo_size_using_git.html#purge-files-from-repository-history (warning: this is a lengthy process - you do not need to wait the 30 mins as described in step 14)
  8. Check the statistics repository size (either in Usage Quotas or as above or in a rails console), it will still report 2MiB
      statistics.refresh!
      statistics.repository_size
      => 2097152
  9. Enable the feature flag (Feature.enable(:recent_objects_for_project_statistics))
  10. Refresh the statistics and check again - it should now be lower, without the 2MiB file 🎉
      statistics.refresh!
      statistics.repository_size
    
      => 2873

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Vijay Hawoldar

Merge request reports