Unable to clean removed sub-submodules when using the GIT_STRATEGY: fetch

Summary

In https://gitlab.com/gitlab-org/gitlab-runner/-/blob/4d48abb0891720bd1350394654b1c5062f6c51d2/shells/abstract.go#L442-444 the sequence is effectively as follows for that function:

...
git submodule sync --recursive
git submodule foreach --recursive git clean -ffxd
git submodule foreach --recursive git reset --hard
git submodule update --init --recursive
...

What this does is:

  1. Update all the submodules' URLs, recursively
  2. Recursively remove any untracked files in the submodules
  3. Recursively put back any removed, tracked files in the submodules
  4. Recursively check out the correct commit in all submodules

The problem is step 2 happening before step 4. Let's say a submodule itself has a submodule, AND that in the new commit hash, that submodule has been removed. When we do step 2, no problem, we're on the earlier commit, and that sub-submodule is tracked and retained. When we do step 4, suddenly that directory that used to be sub-submodule is now some unknown directory that's left behind. If we had done the clean after the update, this would have been correctly removed.

Note: gitlab-runner!2883 (merged) MR adds a second git clean -ffxd after step 4 in the original sequence fix the issue gitlab-runner!2351 (closed) According to the gitlab-runner!2351 (comment 403269219) by @pedropombeiro

Steps to reproduce

git clone https://gitlab.com/dkozlov/merge_request-2883

cd merge_request-2883

git checkout 7f9b73ce33942c117478212c98bd6c1e8a021d1a

git status

HEAD detached at 7f9b73c
nothing to commit, working tree clean

git submodule update --init --recursive

Cloning into '/home/user/git/test/merge_request-2883/merge_request-2883-submodule-a'...
warning: redirecting to https://gitlab.com/dkozlov/merge_request-2883-submodule-a.git/
Submodule path 'merge_request-2883-submodule-a': checked out 'aafce850fc665d23e7c2351edf3651df1dfd396e'
Submodule 'merge_request-2883-sub-submodule-b-1' (https://gitlab.com/dkozlov/merge_request-2883-sub-submodule-b-1) registered for path 'merge_request-2883-submodule-a/merge_request-2883-sub-submodule-b-1'
Submodule 'merge_request-2883-sub-submodule-b-2' (https://gitlab.com/dkozlov/merge_request-2883-sub-submodule-b-2) registered for path 'merge_request-2883-submodule-a/merge_request-2883-sub-submodule-b-2'
Cloning into '/home/user/git/test/merge_request-2883/merge_request-2883-submodule-a/merge_request-2883-sub-submodule-b-1'...
warning: redirecting to https://gitlab.com/dkozlov/merge_request-2883-sub-submodule-b-1.git/
Cloning into '/home/user/git/test/merge_request-2883/merge_request-2883-submodule-a/merge_request-2883-sub-submodule-b-2'...
warning: redirecting to https://gitlab.com/dkozlov/merge_request-2883-sub-submodule-b-2.git/
Submodule path 'merge_request-2883-submodule-a/merge_request-2883-sub-submodule-b-1': checked out '49fd0517d8bc4649aa51c6222390b5f9cd560964'
Submodule path 'merge_request-2883-submodule-a/merge_request-2883-sub-submodule-b-2': checked out 'b78c98693b313f10f26821a0f5d28284c3f0474b'

ls merge_request-2883-submodule-a/

merge_request-2883-sub-submodule-b-1  merge_request-2883-sub-submodule-b-2  README.md

git checkout d476cb4bdf877d764392a7f013bfc75dbfb1584f

M	merge_request-2883-submodule-a
Previous HEAD position was 7f9b73c add submodule-a
HEAD is now at d476cb4 update merge_request-2883-submodule-a

git submodule sync --recursive

Synchronizing submodule url for 'merge_request-2883-submodule-a'Synchronizing submodule url for 'merge_request-2883-submodule-a/merge_request-2883-sub-submodule-b-1'
Synchronizing submodule url for 'merge_request-2883-submodule-a/merge_request-2883-sub-submodule-b-2'

git submodule foreach --recursive git clean -ffxd

Entering 'merge_request-2883-submodule-a'
Entering 'merge_request-2883-submodule-a/merge_request-2883-sub-submodule-b-1'
Entering 'merge_request-2883-submodule-a/merge_request-2883-sub-submodule-b-2'

git submodule foreach --recursive git reset --hard

Entering 'merge_request-2883-submodule-a'
HEAD is now at aafce85 add submodules b1 and b2
Entering 'merge_request-2883-submodule-a/merge_request-2883-sub-submodule-b-1'
HEAD is now at 49fd051 Initial commit
Entering 'merge_request-2883-submodule-a/merge_request-2883-sub-submodule-b-2'
HEAD is now at b78c986 Initial commit

git submodule update --init --recursive

**warning: unable to rmdir 'merge_request-2883-sub-submodule-b-2': Directory not empty**
Submodule path 'merge_request-2883-submodule-a': checked out '7eb4be081612cf7476dd44bf905a4348e050b669'

cd merge_request-2883-submodule-a git statusmerge_request-2883-sub-submodule-b-2

HEAD detached at 7eb4be0
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	**merge_request-2883-sub-submodule-b-2**/

Example Project

https://gitlab.com/dkozlov/merge_request-2883 https://gitlab.com/dkozlov/merge_request-2883-submodule-a https://gitlab.com/dkozlov/merge_request-2883-sub-submodule-b-1 https://gitlab.com/dkozlov/merge_request-2883-sub-submodule-b-2

What is the current bug behavior?

Removed "merge_request-2883-sub-submodule-b-2" sub-submodule exists when using GIT_STRATEGY: fetch

What is the expected correct behavior?

Removed "merge_request-2883-sub-submodule-b-2" sub-submodule should be cleaned when using GIT_STRATEGY: fetch

Relevant logs and/or screenshots

// How to reproduce https://gitlab.com/dkozlov/merge_request-2883:

// Create 4 repositories, e.g.
merge_request-2883
merge_request-2883-submodule-a
merge_request-2883-sub-submodule-b-1
merge_request-2883-sub-submodule-b-2

Perform the following actions:

git clone https://gitlab.com/dkozlov/merge_request-2883-submodule-a

cd merge_request-2883-submodule-a

git submodule add https://gitlab.com/dkozlov/merge_request-2883-sub-submodule-b-1

git submodule add https://gitlab.com/dkozlov/merge_request-2883-sub-submodule-b-2

git commit -m "add submodules b1 and b2"

cd ..

git clone https://gitlab.com/dkozlov/merge_request-2883

cd merge_request-2883

git commit -m "add submodule-a"

// Update the git submodules recursively

git submodule update --init --recursive

Submodule 'merge_request-2883-sub-submodule-b-1' (https://gitlab.com/dkozlov/merge_request-2883-sub-submodule-b-1) registered for path 'merge_request-2883-submodule-a/merge_request-2883-sub-submodule-b-1'
Submodule 'merge_request-2883-sub-submodule-b-2' (https://gitlab.com/dkozlov/merge_request-2883-sub-submodule-b-2) registered for path 'merge_request-2883-submodule-a/merge_request-2883-sub-submodule-b-2'
Cloning into '/home/user/git/merge_request-2883/merge_request-2883-submodule-a/merge_request-2883-sub-submodule-b-1'...
warning: redirecting to https://gitlab.com/dkozlov/merge_request-2883-sub-submodule-b-1.git/
Cloning into '/home/user/git/merge_request-2883/merge_request-2883-submodule-a/merge_request-2883-sub-submodule-b-2'...
warning: redirecting to https://gitlab.com/dkozlov/merge_request-2883-sub-submodule-b-2.git/
Submodule path 'merge_request-2883-submodule-a/merge_request-2883-sub-submodule-b-1': checked out '49fd0517d8bc4649aa51c6222390b5f9cd560964'
Submodule path 'merge_request-2883-submodule-a/merge_request-2883-sub-submodule-b-2': checked out 'b78c98693b313f10f26821a0f5d28284c3f0474b'

cd ../merge_request-2883-submodule-a

// Remove submodule "merge_request-2883-sub-submodule-b-2" using the following guide https://stackoverflow.com/questions/1260748/how-do-i-remove-a-submodule

git submodule deinit -f -- merge_request-2883-sub-submodule-b-2

Cleared directory 'merge_request-2883-sub-submodule-b-2'
Submodule 'merge_request-2883-sub-submodule-b-2' (https://gitlab.com/dkozlov/merge_request-2883-sub-submodule-b-2) unregistered for path 'merge_request-2883-sub-submodule-b-2'

rm -rf .git/modules/merge_request-2883-sub-submodule-b-2

git rm -f merge_request-2883-sub-submodule-b-2

rm 'merge_request-2883-sub-submodule-b-2'

git status

On branch master
Your branch is up to date with 'origin/master'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	modified:   .gitmodules
	deleted:    merge_request-2883-sub-submodule-b-2

git commit -m "remove b-2"

git push

cd ../merge_request-2883

git log

commit 7f9b73ce33942c117478212c98bd6c1e8a021d1a (HEAD -> master, origin/master, origin/HEAD)
Author: Dmitry Kozlov <dmitry.f.kozlov@gmail.com>
Date:   Sun May 16 04:42:54 2021 +0300

    add submodule-a

commit 16110481e4a4b13ef4aaf4600ee1cb075029c4bb
Author: DmtiryK <dmitry.f.kozlov@gmail.com>
Date:   Sun May 16 01:08:55 2021 +0000

    Initial commit

ls merge_request-2883-submodule-a/

merge_request-2883-sub-submodule-b-1  merge_request-2883-sub-submodule-b-2  README.md

git submodule update --rebase --remote

warning: redirecting to https://gitlab.com/dkozlov/merge_request-2883-submodule-a.git/
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), 347 bytes | 347.00 KiB/s, done.
From https://gitlab.com/dkozlov/merge_request-2883-submodule-a
   aafce85..7eb4be0  master     -> origin/master
First, rewinding head to replay your work on top of it...
warning: unable to rmdir 'merge_request-2883-sub-submodule-b-2': Directory not empty
Fast-forwarded master to 7eb4be081612cf7476dd44bf905a4348e050b669.
Submodule path 'merge_request-2883-submodule-a': rebased into '7eb4be081612cf7476dd44bf905a4348e050b669'

git commit -m "update merge_request-2883-submodule-a"

git push

Possible fixes

gitlab-runner!2883 (merged)

Edited by DmtiryK