Skip to content

Add indexes for namespace_traversal_ids

What does this MR do and why?

Add basic namespace_traversal_ids for issues table

This adds the two primary indexes we need for the namespace_traversal_ids:

  1. An index on the namespace_traversal_ids which will be used for sub group lookups and sorting will mostly happen in memory.

  2. A separate on the root namespace since root namespaces contain a larger dataset. So we need a special index and not sort in memory if possible.

    For now only focusing on created_at since the query optimizer seems to prefer using the index on updated_at and filter on traversal_ids.

Changelog: added

References

Screenshots or screen recordings

Query Before After Notes
gitlab-org open work items, order by updated_at https://console.postgres.ai/gitlab/gitlab-production-main/sessions/44132/commands/135043 https://console.postgres.ai/gitlab/gitlab-production-main/sessions/44448/commands/136283 We have no specialized index for updated_at but it can use the updated_at index from issues and filters on namespace_level_work_items
gitlab-org all work items, ordered by updated_at https://console.postgres.ai/gitlab/gitlab-production-main/sessions/44238/commands/135665 https://console.postgres.ai/gitlab/gitlab-production-main/sessions/44448/commands/136287 Same as above
gitlab-org opened worki tems, ordered by created_at https://console.postgres.ai/gitlab/gitlab-production-main/sessions/44238/commands/135676 https://console.postgres.ai/gitlab/gitlab-production-main/sessions/44448/commands/136288 Main performance improvement for the root namespace index
gitlab-com/support open work items, ordered by updated_at https://console.postgres.ai/gitlab/gitlab-production-main/sessions/44238/commands/135695 https://console.postgres.ai/gitlab/gitlab-production-main/sessions/44448/commands/136290 gitlab-org/support is a large sub group and the index doesn't get used
gitlab-com/support open work items, ordered by created_at https://console.postgres.ai/gitlab/gitlab-production-main/sessions/44238/commands/135689 https://console.postgres.ai/gitlab/gitlab-production-main/sessions/44448/commands/136292
gitlab-com/support ordered by title https://console.postgres.ai/gitlab/gitlab-production-main/sessions/44238/commands/135697 https://console.postgres.ai/gitlab/gitlab-production-main/sessions/44448/commands/136293 Same as ordering by created_at, the memory sorting is efficient enough even for larger groups

In addition to the queries above, I have an automated script which runs a few more queries with different filters in comparison of the old and new query times and buffer usage: https://gitlab.com/gitlab-org/plan-stage/product-planning/issue-database-research-benchmarking/-/blob/main/report.md?ref_type=heads

How to set up and validate locally

Running through https://docs.gitlab.com/development/database/adding_database_indexes/#test-database-index-changes-locally

bundle exec rails gitlab:db:execute_async_index_operations:all

\d issues
..
 namespace_traversal_ids               | bigint[]                    |           |          | '{}'::bigint[]
Indexes:
    "issues_pkey" PRIMARY KEY, btree (id)
[...]
    "idx_issues_root_namespace_created_at" btree ((namespace_traversal_ids[1]), created_at)
    "idx_issues_state_id_namespace_traversal_ids" btree (state_id, namespace_traversal_ids)
...

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Nicolas Dular

Merge request reports

Loading