Skip to content

Include empty/no repo in projects_not_indexed rake task

What does this MR do and why?

Related to #374536 (closed)

This MR adds fixes for elastic rake tasks: projects_not_indexed and index_projects_status

  • Projects with empty or missing repositories should be included in counts
  • Respect index limiting setting to determine project list to use
  • Add missing specs
  • Fix rubocop todos for both elastic.rake and elastic_rake_spec.rb

Database

Before

Query: Project.all.count

Note: this query still exists for self managed instances that do not use index limiting

select count(*) from projects

Query: IndexStatus.count

 SELECT COUNT(*) FROM "index_statuses"

After/New queries

Query: IndexStatus.for_project(Project.all).count

 SELECT COUNT(*) FROM "index_statuses" WHERE "index_statuses"."project_id" IN (SELECT "projects"."id" FROM "projects")

Query: ::Gitlab::CurrentSettings.elasticsearch_limited_projects.count

SELECT
    COUNT(*)
FROM ((
        SELECT
            "projects".*
        FROM
            "projects"
        WHERE
            "projects"."namespace_id" IN ( WITH "descendants_base_cte" AS MATERIALIZED (
                    SELECT
                        "namespaces"."id",
                        "namespaces"."traversal_ids"
                    FROM
                        "namespaces"
                    WHERE
                        "namespaces"."id" IN (
                            SELECT
                                "elasticsearch_indexed_namespaces"."namespace_id"
                            FROM
                                "elasticsearch_indexed_namespaces"
)
),
                        "superset" AS (
                            SELECT
                                d1.traversal_ids
                            FROM
                                descendants_base_cte d1
                            WHERE
                                NOT EXISTS (
                                    SELECT
                                        1
                                    FROM
                                        descendants_base_cte d2
                                    WHERE
                                        d2.id = ANY (d1.traversal_ids)
                                        AND d2.id <> d1.id))
                                SELECT DISTINCT
                                    "id"
                                FROM
                                    "superset",
                                    "namespaces"
                                WHERE
                                    next_traversal_ids_sibling ("superset"."traversal_ids") > "namespaces"."traversal_ids"
                                    AND "superset"."traversal_ids" <= "namespaces"."traversal_ids"))
                        UNION (
                            SELECT
                                "projects".*
                            FROM
                                "projects"
                            WHERE
                                "projects"."id" IN (
                                    SELECT
                                        "elasticsearch_indexed_projects"."project_id"
                                    FROM
                                        "elasticsearch_indexed_projects"))) projects

Query: IndexStatus.for_project(::Gitlab::CurrentSettings.elasticsearch_limited_projects).count

SELECT
    COUNT(*)
FROM
    "index_statuses"
WHERE
    "index_statuses"."project_id" IN (
        SELECT
            "projects"."id"
        FROM ((
                SELECT
                    "projects".*
                FROM
                    "projects"
                WHERE
                    "projects"."namespace_id" IN ( WITH "descendants_base_cte" AS MATERIALIZED (
                            SELECT
                                "namespaces"."id",
                                "namespaces"."traversal_ids"
                            FROM
                                "namespaces"
                            WHERE
                                "namespaces"."id" IN (
                                    SELECT
                                        "elasticsearch_indexed_namespaces"."namespace_id"
                                    FROM
                                        "elasticsearch_indexed_namespaces"
)
),
                                "superset" AS (
                                    SELECT
                                        d1.traversal_ids
                                    FROM
                                        descendants_base_cte d1
                                    WHERE
                                        NOT EXISTS (
                                            SELECT
                                                1
                                            FROM
                                                descendants_base_cte d2
                                            WHERE
                                                d2.id = ANY (d1.traversal_ids)
                                                AND d2.id <> d1.id))
                                        SELECT DISTINCT
                                            "id"
                                        FROM
                                            "superset",
                                            "namespaces"
                                        WHERE
                                            next_traversal_ids_sibling ("superset"."traversal_ids") > "namespaces"."traversal_ids"
                                            AND "superset"."traversal_ids" <= "namespaces"."traversal_ids"))
                                UNION (
                                    SELECT
                                        "projects".*
                                    FROM
                                        "projects"
                                    WHERE
                                        "projects"."id" IN (
                                            SELECT
                                                "elasticsearch_indexed_projects"."project_id"
                                            FROM
                                                "elasticsearch_indexed_projects"))) projects)

Screenshots or screen recordings

N/A

How to set up and validate locally

  1. setup gdk for elasticsearch and index everything
  2. run the rake tasks and verify everything is indexed:
    bundle exec rake gitlab:elastic:index_projects_status
    Indexing is 100.00% complete (36/36 projects)
    
    bundle exec rake gitlab:elastic:projects_not_indexed
    All projects are currently indexed
  3. enable index limiting in the admin UI for 1 project: http://gdk.test:3000/admin/application_settings/advanced_search
  4. run the rake tasks and verify everything is indexed and only 1 project is checked:
    bundle exec rake gitlab:elastic:index_projects_status
    Indexing is 100.00% complete (1/1 projects)
    
    bundle exec rake gitlab:elastic:projects_not_indexed
    All projects are currently indexed
  5. delete index_status record for that project: ::Gitlab::CurrentSettings.elasticsearch_limited_projects.first.index_status.delete
  6. run the rake tasks and verify nothing is indexed and only 1 project is checked
    bundle exec rake gitlab:elastic:index_projects_status
    Indexing is 0.00% complete (0/1 projects)
    
    bundle exec rake gitlab:elastic:projects_not_indexed
    Project 'gitlab-org/gitlab-shell' (ID: 3) isn't indexed.
    1 out of 1 non-indexed projects shown.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Terri Chu

Merge request reports