Skip to content

Projects created from template fail to import issues under large groups

Summary

Projects created from a template in large groups fail to import all issues from the template.

Steps to reproduce

  1. Find a Group with a large number of projects/groups/issues. For example, the Learn Labs group or The GitLab Support Team
  2. From the group, select New Project > Create from template
  3. Select the Sample GitLab Project template
    1. This template contains 29 issues and 29 merge requests
  4. Results
  • The project takes a long time to create and when done, the number of issues is less than the expected 29, sometimes 0
  • Sidekiq logs show an error and the query is canceled

Example Project

https://gitlab.com/gitlab-com/support/iris-tries-to-break-things - 22 of the 29 issues were imported under the GitLab Support Team group

https://gitlab.com/gitlab-learn-labs/environments/session-2848d0eb/iupbenen/awesome/weeeeee - 0 of the 29 issues were imported under the GitLab Learn Labs group

What is the current bug behavior?

Issues are not fully imported into the new project created from the template

What is the expected correct behavior?

Issues would be imported into new projects created from a template

Relevant logs and/or screenshots

sample-project-defaults.png

test-project-in-learn-labs.png

test-project-in-gitlab-support.png

log.json

PG::QueryCanceled: ERROR: canceling statement due to statement timeout

Error seems to have started appearing on GitLab.com 4 months ago per Sentry

https://new-sentry.gitlab.net/organizations/gitlab/issues/887038/?project=3&query=is%3Aunresolved+RepositoryImportWorker&referrer=issue-stream&statsPeriod=14d&stream_index=2

Looks like this is affecting larger customer too

https://log.gprd.gitlab.net/app/r/s/Oghvm

Output of checks

This bug happens on GitLab.com

Possible fixes

This looks like the same cause as issue Project imported by file export/direct transfer... (#458367 - closed) • James Nutt • 18.0

Per @jfarmiloe

This is the query that is timing out:

/*application:sidekiq,correlation_id:01J90ADCGTKBDSKXAVC8H5YDS0,jid:795ac6e39a2e1833ed18e536,endpoint_id:RepositoryImportWorker,db_config_name:main*/ SELECT "issues"."id", "issues"."title", "issues"."author_id", "issues"."project_id", "issues"."created_at", "issues"."updated_at", "issues"."description", "issues"."milestone_id", "issues"."iid", "issues"."updated_by_id", "issues"."weight", "issues"."confidential", "issues"."moved_to_id", "issues"."due_date", "issues"."lock_version", "issues"."title_html", "issues"."description_html", "issues"."time_estimate", "issues"."relative_position", "issues"."service_desk_reply_to", "issues"."cached_markdown_version", "issues"."last_edited_at", "issues"."last_edited_by_id", "issues"."discussion_locked", "issues"."closed_at", "issues"."closed_by_id", "issues"."state_id", "issues"."duplicated_to_id", "issues"."promoted_to_epic_id", "issues"."health_status", "issues"."external_key", "issues"."sprint_id", "issues"."blocking_issues_count", "issues"."upvotes_count", "issues"."work_item_type_id", "issues"."namespace_id", "issues"."start_date", "issues"."imported_from", "issues"."correct_work_item_type_id" FROM "issues" WHERE "issues"."project_id" IN (SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN (SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, $1)] AS id FROM "namespaces" WHERE "namespaces"."type" = $2 AND (traversal_ids @> ($3)))) ORDER BY "issues"."id" ASC LIMIT $4

The @> operator is "contains". traversal_ids is a set containing the group hierarchy of a namespace.The log also has json-extra.source = process_relation_item!The process_relation_item! function is called in lib/gitlab/import_export/group/relation_tree_restorer.rb which is the import side of things:

      def process_relation_item!(relation_key, relation_definition, relation_index, data_hash)
          relation_object = build_relation(relation_key, relation_definition, relation_index, data_hash)

          return unless relation_object
          return if relation_invalid_for_importable?(relation_object)
          return if skip_on_duplicate_iid? && previously_imported?(relation_object, relation_key)

          relation_object.assign_attributes(importable_class_sym => @importable)

          save_relation_object(relation_object, relation_key, relation_definition, relation_index)
        rescue StandardError => e
          import_failure_service.log_import_failure(
            source: 'process_relation_item!',
            relation_key: relation_key,
            relation_index: relation_index,
            exception: e,
            external_identifiers: external_identifiers(data_hash))
        end

So the query is being run from in there somewhere and timing out.