Geo: gitmodulesUrl: disallowed submodule url error causes repository sync failures

GitLab Geo repository synchronization fails with the error Error syncing repository: 13:creating repository: cloning repository: exit status 128 when repositories contain invalid submodule URLs in their .gitmodules files.

More details from this comment - https://gitlab.com/gitlab-com/gl-infra/gitlab-dedicated/team/-/issues/8576#note_2565818604

For 1: This new git fsck behavior comes from a change in upstream Git, whereby this check was added. It is therefore not a Gitaly issue specifically. See gitaly#5641. This will probaly impact other Geo customers. If my understanding is correct the current workarounds are:

  1. Ignore the "gitModulesUrl" git fsck check error as mentioned in #462567 (comment 2086468377)
  2. Fix the invalid URL wit git-filter-repo tool as mentioned in #462567 (comment 2534852005).
  3. Manually copy the project repositories as mentioned in https://gitlab.com/gitlab-com/request-for-help/-/issues/2151#note_2273834720.

Related Issue

This issue was encountered during a GitLab Dedicated migration: https://gitlab.com/gitlab-com/gl-infra/gitlab-dedicated/team/-/issues/8576#note_2568797060

Workaround

1. Backup projects

2. Remove blobs

Note: Please also make it clear that the developers who work on these projects must remove their current copy and clone the fixed repository after the steps above. Otherwise, they can reintroduce the offending blobs.

Important limitation: If any of these repositories are part of a fork network, the blob removal method may not work (blobs contained in object pools cannot be removed this way).

3. Fix .gitmodules invalid URLs if required

  • Check the state of .gitmodules files in each affected repository
  • If the .gitmodules still contains invalid URLs like https://example.gitlab.com:foo/bar.git instead of https://example.gitlab.com/foo/bar.git, the customer needs to:
    • Fix the URLs in the .gitmodules file
    • Push a commit with valid URLs``
Edited by 🤖 GitLab Bot 🤖