Skip to content

Move gitlab:elastic:projects_not_indexed to finder

Terri Chu requested to merge 384039-move-index-status-rake-tasks-to-service into master

What does this MR do and why?

Related to #384039 (closed)

  • Move projects_not_indexed code from rake task to a finder (to allow it to be called in other places)
  • Update specs for rake task
  • Add new specs for finder

Screenshots or screen recordings

N/A

How to set up and validate locally

  1. Setup gdk for elasticsearch and index everything
  2. Run the rake task, everything is indexed
    bundle exec rake gitlab:elastic:projects_not_indexed
    All projects are currently indexed
  3. Remove a few projects from index_statuses table: Project.all.sample(5).each {|x| x.index_status.delete }
  4. Run the rake task, verify the 5 projects are listed: bundle exec rake gitlab:elastic:projects_not_indexed
    bundle exec rake gitlab:elastic:projects_not_indexed
    Project 'jashkenas/Underscore' (ID: 6) isn't indexed.
    Project 'pamula/Flight' (ID: 17) isn't indexed.
    Project 'jtleek/Datasharing' (ID: 14) isn't indexed.
    Project 'jlevy/the-art-of-command-line' (ID: 11) isn't indexed.
    Project 'earleen/Flight' (ID: 23) isn't indexed.
    5 out of 5 non-indexed projects shown.

Database

Note that this service cannot be run on GitLab.com (the query times out and takes way too long).

The timings from database-lab are provided using Project.all vs. ::Gitlab::CurrentSettings.elasticsearch_limited_projects

postgres.ai timings - no index limiting enabled

Project.all.not_indexed_in_elasticsearch.each_batch do |batch|

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/21224/commands/69204

SELECT
    "projects"."id"
FROM
    "projects"
    LEFT JOIN index_statuses ON projects.id = index_statuses.project_id
WHERE
    "index_statuses"."project_id" IS NULL
ORDER BY
    "projects"."id" ASC
LIMIT 1

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/21224/commands/69205

SELECT
    "projects"."id"
FROM
    "projects"
    LEFT JOIN index_statuses ON projects.id = index_statuses.project_id
WHERE
    "index_statuses"."project_id" IS NULL
    AND "projects"."id" >= 1
ORDER BY
    "projects"."id" ASC
LIMIT 1 OFFSET 1000

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/21224/commands/69206

SELECT
    "projects"."id",
    "projects"."name",
    "projects"."path",
    "projects"."description",
    "projects"."created_at",
    "projects"."updated_at",
    "projects"."creator_id",
    "projects"."namespace_id",
    "projects"."last_activity_at",
    "projects"."import_url",
    "projects"."visibility_level",
    "projects"."archived",
    "projects"."avatar",
    "projects"."merge_requests_template",
    "projects"."star_count",
    "projects"."merge_requests_rebase_enabled",
    "projects"."import_type",
    "projects"."import_source",
    "projects"."approvals_before_merge",
    "projects"."reset_approvals_on_push",
    "projects"."merge_requests_ff_only_enabled",
    "projects"."issues_template",
    "projects"."mirror",
    "projects"."mirror_last_update_at",
    "projects"."mirror_last_successful_update_at",
    "projects"."mirror_user_id",
    "projects"."shared_runners_enabled",
    "projects"."runners_token",
    "projects"."build_allow_git_fetch",
    "projects"."build_timeout",
    "projects"."mirror_trigger_builds",
    "projects"."pending_delete",
    "projects"."public_builds",
    "projects"."last_repository_check_failed",
    "projects"."last_repository_check_at",
    "projects"."only_allow_merge_if_pipeline_succeeds",
    "projects"."has_external_issue_tracker",
    "projects"."repository_storage",
    "projects"."repository_read_only",
    "projects"."request_access_enabled",
    "projects"."has_external_wiki",
    "projects"."ci_config_path",
    "projects"."lfs_enabled",
    "projects"."description_html",
    "projects"."only_allow_merge_if_all_discussions_are_resolved",
    "projects"."repository_size_limit",
    "projects"."printing_merge_request_link_enabled",
    "projects"."auto_cancel_pending_pipelines",
    "projects"."service_desk_enabled",
    "projects"."cached_markdown_version",
    "projects"."delete_error",
    "projects"."last_repository_updated_at",
    "projects"."disable_overriding_approvers_per_merge_request",
    "projects"."storage_version",
    "projects"."resolve_outdated_diff_discussions",
    "projects"."remote_mirror_available_overridden",
    "projects"."only_mirror_protected_branches",
    "projects"."pull_mirror_available_overridden",
    "projects"."jobs_cache_index",
    "projects"."external_authorization_classification_label",
    "projects"."mirror_overwrites_diverged_branches",
    "projects"."pages_https_only",
    "projects"."external_webhook_token",
    "projects"."packages_enabled",
    "projects"."merge_requests_author_approval",
    "projects"."pool_repository_id",
    "projects"."runners_token_encrypted",
    "projects"."bfg_object_map",
    "projects"."detected_repository_languages",
    "projects"."merge_requests_disable_committers_approval",
    "projects"."require_password_to_approve",
    "projects"."max_pages_size",
    "projects"."max_artifacts_size",
    "projects"."pull_mirror_branch_prefix",
    "projects"."remove_source_branch_after_merge",
    "projects"."marked_for_deletion_at",
    "projects"."marked_for_deletion_by_user_id",
    "projects"."autoclose_referenced_issues",
    "projects"."suggestion_commit_message",
    "projects"."project_namespace_id",
    "projects"."hidden"
FROM
    "projects"
    LEFT JOIN index_statuses ON projects.id = index_statuses.project_id
WHERE
    "index_statuses"."project_id" IS NULL
    AND "projects"."id" >= 1
    AND "projects"."id" < 1000

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Terri Chu

Merge request reports