Skip to content

Index Update Properly Supports Submodules

What is this MR?

This MR changes the way that gitaly handles index updates for submodules in a given repository. As a result, this allows the multi_action command on repositories (which is used for commits POST requests on the gitlab Rest API) to update submodules SHA commit reference.

This is aimed to (at least, in part) help address gitlab#25940 (allowing submodule changes to be done alongside other actions in a single commit).

What is the previous (current) behavior?

When making a POST request to the gitlab commits API, the submodule will be updated (but the commit seems to be completely unrelated).
Example Rest Call:

curl --request POST --header "PRIVATE-TOKEN: XXXXXX" --data "{ \"branch\": \"master\", \"commit_message\": \"some commit message\", \"actions\": [ { \"action\": \"update\", \"file_path\": \"submoduleX\", \"content\": \"01464e1616e3fdd5c60c0cc5516c1d1454cc4185\" } ] }" --header "Content-Type: application/json" https://gitlab.com/api/v4/projects/123/repository/commits

The result commit (using the diff API) may look something like:

[
  {
    "old_path": "submoduleX",
    "new_path": "submoduleX",
    "a_mode": "160000",
    "b_mode": "160000",
    "new_file": false,
    "renamed_file": false,
    "deleted_file": false,
    "diff": "@@ -1 +1 @@\n-Subproject commit 70c881d4a26984ddce795f6f71817c9cf4480e79\n+Subproject commit 88e6101730f4cf454196252e1e1e2e77fd30da4c\n"
  }
]

Note that, even in this fictional (but accurate 😉) example, that neither of the SHA hashes are at all (seemingly) related to the SHA in the original request.
The current behavior will also allow for any text to be placed as the content. E.g. \"content\": \"not a sha-1 hash\". This too will result in the submodule being updated with (seemingly) unrelated SHA-1 hashes. More detail below.

What is the behavior with this MR?

With the changes in this MR, when the above example request is sent, the submodule will be updated correctly. In the fictional example, the commit would instead have "diff": "@@ -1 +1 @@\n-Subproject commit 70c881d4a26984ddce795f6f71817c9cf4480e79\n+Subproject commit 01464e1616e3fdd5c60c0cc5516c1d1454cc4185\n".
Further, when sending non-SHA hash content (e.g. \"content\": \"not a sha-1 hash\") for a submodule, an error is raised instead of making an incorrect update.

Other considered/related features

This MR focuses mostly on the update action, but there are a few other actions that are somewhat related when it comes to submodules.

Addressed in this MR

This MR also adds in submodule support for the move action. If a request wishes to move and modify a submodule, the hash is parsed the same as the update action.

Not addressed in this MR

The create action still only support creation of files -- no creation of submodules. Such support was not added as creation of submodules require a few extra fields of data (repo of submodule and branch of submodule) in addition to the SHA. It would probably make more sense to create a new action for creating submodules, but that was left out of this MR as it was not the main focus.

Current Behavior - Deep Dive

When performing an action, the index will always create a blob file for the given content. It will then set the ObjId for the entry (at the given file path) to the newly written blob ID. This is incorrect for a number of issues.

  1. If the content is hash and if the path is a submodule, that hash should be used directly.
  2. The resulting hash is seemingly unrelated to the hash in the content -- when it is in fact the hash of the hash (via git's object file hashing). This is confusing and hard to understand/diagnose what is happening, since there is some sort of change occurring -- but it seems random (until repeated testing).
  3. The OID set to the submodule does not point to any such commit, it points to the orphaned blob file. This is not useful, resulting in a broken submodule reference.
  4. Gitaly will (unnecessarily) write an orphaned blob file, but it is only "referenced" by the submodule. This does not count, resulting in the blob being garbage collected at some future time.
Edited by 🤖 GitLab Bot 🤖

Merge request reports