Many indexes are unused according to PostgreSQL and may be removed

One can run the following query to get an overview of all indexes and some usage statistics (see https://www.postgresql.org/docs/9.2/static/monitoring-stats.html section "Table 27-6. pg_stat_all_indexes View"):

SELECT relname as table_name, indexrelname as index_name, idx_scan, idx_tup_read, idx_tup_fetch, pg_size_pretty(pg_relation_size(indexrelname::regclass))
FROM pg_stat_all_indexes
WHERE schemaname = 'public'
AND "idx_scan" = 0
ORDER BY pg_relation_size(indexrelname::regclass) desc;

This outputs the contents of log.txt

Some of these indexes are quite weird. For example, the index index_services_on_created_at_and_id is never used and indexes services on (created_at, id). Since id is already indexed (it being a primary key) the combination is completely useless. At most this should be a regular single column index, if it's needed at all.

Indexes To Verify

Some of these may be EE specific, this list also excludes primary key indexes:

  • ci_taggings_idx: the entire ci_taggings table appears to be unused, so in theory the whole table can be removed
  • index_audit_events_on_author_id: used by AuditEvent.user, but since this runs a separate query this would not use the index
  • index_audit_events_on_type: the column in question does not appear to be used for queries, can be removed
  • index_ci_builds_on_erased_by_id: column not used for querying, can be removed
  • index_ci_builds_on_project_id_and_commit_id: columns are already indexed individually, can be removed
  • index_ci_builds_on_type: column does not appear to be used, can be removed
  • index_ci_commits_on_project_id: not used in any queries, can be removed
  • index_ci_commits_on_project_id_and_committed_at: not used, can be removed
  • index_ci_commits_on_project_id_and_committed_at_and_id: not used, can be removed
  • index_ci_commits_on_project_id_and_sha: not used, can be removed
  • index_ci_commits_on_sha: not used, can be removed
  • index_ci_events_on_created_at: table not in use, can be removed
  • index_ci_events_on_is_admin: table not in use, can be removed
  • index_ci_events_on_project_id: table not in use, can be removed
  • index_ci_jobs_on_deleted_at: unused, can be removed
  • index_ci_jobs_on_project_id: unused, can be removed
  • index_ci_projects_on_gitlab_id: unused, can be removed
  • index_ci_projects_on_shared_runners_enabled: only used in a query that instead uses a composite index, can be removed
  • index_ci_services_on_project_id: table appears to be unused, can be removed
  • index_ci_sessions_on_session_id: table appears to be unused, can be removed
  • index_ci_sessions_on_updated_at: table appears to be unused, can be removed
  • index_ci_taggings_on_taggable_id_and_taggable_type_and_context: table unused, can be removed
  • index_ci_tags_on_name: table unused, can be removed
  • index_ci_triggers_on_deleted_at: unused, can be removed
  • index_identities_on_created_at_and_id: unused, can be removed
  • index_issues_on_title: never used as title searches use %QUERY% which never uses an index, can be removed
  • index_keys_on_created_at_and_id: can be removed
  • index_members_on_created_at_and_id: can be removed
  • index_members_on_type: not used for filtering, can be removed
  • index_merge_requests_on_deleted_at: can be removed
  • index_milestones_on_created_at_and_id: can be removed
  • index_namespaces_on_visibility_level: no filtering happens using namespaces.visibility_level, can be removed
  • index_projects_on_builds_enabled: can be removed
  • index_projects_on_builds_enabled_and_shared_runners_enabled: unused due to other columns/indexes already filtering enough data
  • index_services_on_category: never used as the query (services.where(category: :ci)) already filters enough data using the project_id index/column
  • index_services_on_created_at_and_id: why does this even exist, can be removed
  • index_services_on_default: not used due to project_id already filtering enough data
  • index_snippets_on_created_at: seriously...can be removed
  • index_snippets_on_created_at_and_id: I guess if one index isn't enough we'll create two? can be removed
  • index_todos_on_state: unused, most likely due to the filter for user_id already filtering enough rows
  • index_web_hooks_on_created_at_and_id: can be removed

EE specific:

  • index_geo_nodes_on_geo_node_key_id: not used in queries, can be removed
  • index_geo_nodes_on_primary: used in lib/gitlab/geo.rb in EE, may be used given enough rows in geo_nodes. Let's keep this one around
  • index_namespaces_on_last_ldap_sync_at: column removed in an 8.11 migration