Skip to content

Add GraphQl option to sort CI catalog resources by usage count

Leaminn Ma requested to merge ci-gql-catalog-resources-add-usage-count into master

What does this MR do and why?

This MR adds the ability to sort catalog resources by component usage count in GraphQl. It also exposes the column last_30_day_usage_count as a GraphQl field in alpha.

The value of this column is currently 0 for all catalog resources. It will be populated as part of #452545 (closed).

Database migration:

This MR also adds an index on last_30_day_usage_count to improve the query performance when sorting by this column. Since only published catalog resources need to be sorted, the index has the condition where state = 1.

There are currently less than 500 rows in catalog_resources, so a post deploy migration was chosen since the index is non-critical to the feature at this time.

Resolves #452620 (closed)

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

  1. You can seed a few catalog resources in the shell with:
bundle exec rake "gitlab:seed:ci_catalog_resources[<YOUR-GROUP-PATH>, 3, true]"
  1. Then in Rails console, update the resources' last_30_day_usage_count values.
projects = Project.last(3) # These should be the 3 projects created in Step 1

projects.each do |project|
  project.catalog_resource.update!(last_30_day_usage_count: rand(1..100))
end
  1. Go to http://gdk.test:3000/-/graphql-explorer and test the following query:
query getSortedCatalogResources {
  ciCatalogResources(sort: USAGE_COUNT_DESC) {
    nodes {
      id
      name
      last30DayUsageCount
    }
  }
}
  1. Confirm that the output is ordered as expected with both USAGE_COUNT_DESC and USAGE_COUNT_ASC.
Screenshots

USAGE_COUNT_DESC

Screenshot_2024-05-03_at_10.44.00_PM

USAGE_COUNT_ASC

Screenshot_2024-05-03_at_10.44.30_PM

Migration

Up

main: == [advisory_lock_connection] object_id: 124500, pg_backend_pid: 38999
main: == 20240504042340 AddIndexCatalogResourcesOnUsageCount: migrating =============
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0170s
main: -- index_exists?(:catalog_resources, :last_30_day_usage_count, {:where=>"state = 1", :name=>"index_catalog_resources_on_last_30_day_usage_count", :algorithm=>:concurrently})
main:    -> 0.0035s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0003s
main: -- add_index(:catalog_resources, :last_30_day_usage_count, {:where=>"state = 1", :name=>"index_catalog_resources_on_last_30_day_usage_count", :algorithm=>:concurrently})
main:    -> 0.0068s
main: -- execute("RESET statement_timeout")
main:    -> 0.0004s
main: == 20240504042340 AddIndexCatalogResourcesOnUsageCount: migrated (0.0548s) ====

main: == [advisory_lock_connection] object_id: 124500, pg_backend_pid: 38999
ci: == [advisory_lock_connection] object_id: 124740, pg_backend_pid: 39003
ci: == 20240504042340 AddIndexCatalogResourcesOnUsageCount: migrating =============
ci: -- transaction_open?(nil)
ci:    -> 0.0000s
ci: -- view_exists?(:postgres_partitions)
ci:    -> 0.0006s
ci: -- index_exists?(:catalog_resources, :last_30_day_usage_count, {:where=>"state = 1", :name=>"index_catalog_resources_on_last_30_day_usage_count", :algorithm=>:concurrently})
ci:    -> 0.0036s
ci: -- execute("SET statement_timeout TO 0")
ci:    -> 0.0003s
ci: -- add_index(:catalog_resources, :last_30_day_usage_count, {:where=>"state = 1", :name=>"index_catalog_resources_on_last_30_day_usage_count", :algorithm=>:concurrently})
ci:    -> 0.0070s
ci: -- execute("RESET statement_timeout")
ci:    -> 0.0003s
ci: == 20240504042340 AddIndexCatalogResourcesOnUsageCount: migrated (0.0301s) ====

ci: == [advisory_lock_connection] object_id: 124740, pg_backend_pid: 39003

Down

main: == [advisory_lock_connection] object_id: 124140, pg_backend_pid: 48027
main: == 20240504042340 AddIndexCatalogResourcesOnUsageCount: reverting =============
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0128s
main: -- indexes(:catalog_resources)
main:    -> 0.0030s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0001s
main: -- remove_index(:catalog_resources, {:algorithm=>:concurrently, :name=>"index_catalog_resources_on_last_30_day_usage_count"})
main:    -> 0.0020s
main: -- execute("RESET statement_timeout")
main:    -> 0.0002s
main: == 20240504042340 AddIndexCatalogResourcesOnUsageCount: reverted (0.0290s) ====

main: == [advisory_lock_connection] object_id: 124140, pg_backend_pid: 48027
ci: == [advisory_lock_connection] object_id: 124140, pg_backend_pid: 48439
ci: == 20240504042340 AddIndexCatalogResourcesOnUsageCount: reverting =============
ci: -- transaction_open?(nil)
ci:    -> 0.0000s
ci: -- view_exists?(:postgres_partitions)
ci:    -> 0.0158s
ci: -- indexes(:catalog_resources)
ci:    -> 0.0046s
ci: -- execute("SET statement_timeout TO 0")
ci:    -> 0.0002s
ci: -- remove_index(:catalog_resources, {:algorithm=>:concurrently, :name=>"index_catalog_resources_on_last_30_day_usage_count"})
ci:    -> 0.0093s
ci: -- execute("RESET statement_timeout")
ci:    -> 0.0005s
ci: == 20240504042340 AddIndexCatalogResourcesOnUsageCount: reverted (0.0508s) ====

ci: == [advisory_lock_connection] object_id: 124140, pg_backend_pid: 48439

Query plans

Query

This is the query from Ci::Catalog::Listing that is executed when you call the ciCatalogResources endpoint with USAGE_COUNT_ASC sorting:

Ci::Catalog::Resource.published.includes(:project).public_or_visible_to_user(current_user).order_by_last_30_day_usage_count_asc
SELECT "catalog_resources".*
FROM "catalog_resources"
WHERE "catalog_resources"."state" = 1
AND (
  EXISTS (
    SELECT 1 FROM "project_authorizations"
    WHERE "project_authorizations"."user_id" = 12867582
    AND (project_authorizations.project_id = catalog_resources.project_id)
  )
  OR catalog_resources.visibility_level IN (0,10,20)
)
ORDER BY "catalog_resources"."last_30_day_usage_count" ASC

Before adding the index

Query plan link: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/28071/commands/87390

  • Sorting happens in memory with quicksort. This isn't a problem now because there are so few rows but it will be a bigger cost when the dataset grows.

After adding the index

Query plan link: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/28071/commands/87394

  • In this query plan, the index is named index_test4.
  • It does not use quicksort and only needs to scan the new index.

Related to #452620 (closed)

Edited by Leaminn Ma

Merge request reports