Skip to content

SPIKE:Performance and Sorting of removing ProjectStatistics#storage_size

What does this MR do and why?

Scope of Validation:

  1. Performance Testing:
    • Measure and compare query execution times to ensure that enabling the method does not significantly impact performance.
  2. Sorting Validation:
    • Ensure that the sorting order remains consistent with our requirements, and current implementation.

========================= Findings =========================

Sorting Analysis:

All specs are passing and sorting is still working as expected:

  • ee/spec/finders/ee/namespaces/projects_finder_spec.rb:
  • ee/spec/models/ee/project_spec.rb:

Database performance analysis:

Current Query plan, Before (LIMIT 20): - Cold run: Time: 22.392 s - Hot run: Time: 69.374 ms

========================= New query plan after using the method (LIMIT 20): =========================

  • Without any indexes (upload_size included):

  • Without any indexes (upload_size is not included):

Index optimizations Option 1: (Recommended)

CREATE INDEX idx_project_statistics_composite_project_id
ON project_statistics (project_id, repository_size, wiki_size, lfs_objects_size, build_artifacts_size, packages_size, snippets_size, uploads_size);

CREATE INDEX idx_project_statistics_composite_project_id_without_uploads
ON project_statistics (project_id, repository_size, wiki_size, lfs_objects_size, build_artifacts_size, packages_size, snippets_size);


VACUUM ANALYZE namespaces;
VACUUM ANALYZE projects;
VACUUM ANALYZE project_statistics;
  • With one new index (idx_project_statistics_composite_project_id) on the sum components (upload_size included):

  • With one new index (idx_project_statistics_composite_project_id_without_uploads) on the sum components (upload_size not included):

Index optimizations 2nd approach

CREATE INDEX idx_project_statistics_composite_namespace_project_id 
ON project_statistics (project_id, namespace_id, repository_size, wiki_size, lfs_objects_size, build_artifacts_size, packages_size, snippets_size, uploads_size);

CREATE INDEX idx_project_statistics_namespace_size_sum_without_uploads 
ON project_statistics (project_id, namespace_id, repository_size, wiki_size, lfs_objects_size, build_artifacts_size, packages_size, snippets_size);

VACUUM ANALYZE namespaces;
VACUUM ANALYZE projects;
VACUUM ANALYZE project_statistics;
  • With one new index (idx_project_statistics_composite_namespace_project_id) on the sum components and project_id, namespace_id (upload_size included):

  • With one new index (idx_project_statistics_namespace_size_sum_without_uploads) on the sum components and project_id, namespace_id (upload_size is not included):

To sum up: The new query integrated into the method illustrates enhanced performance when compared to the current query, even without the incorporation of any new indexes (Cold run). In terms of the Hot run, the new approach only shows a marginal increase of 2 ms. Additionally, the introduction of a single new index, targeting the sum components and project_id exclusively, has notably bolstered query performance during the cold run. Although the hot run experiences slight changes in performance (approximately 20 ms), they are relatively insignificant. Moreover, the application of a composite index that combines namespace_id and project_id significantly elevates performance during the cold run.

Challenges:

  • Validate with database if the new performance results are accepted or if they could be optimized.

========================= Suggestions: =========================

Consider the addition of the idx_project_statistics_composite_project_id and idx_project_statistics_composite_project_id_without_uploads indexes which is effective for indexing the total storage size on a cold and hot run

Edited by Mohamed Hamda

Merge request reports