CommitCollection#enrich! to be called automatically when commit data that is missing is requested
Problem to solve
The commit data for MergeRequest#commits
is saved in PostgreSQL rather than in gitaly for the performance gain of doing so.
Specifically; the data source of MergeRequest#commits
is MergeRequestDiffCommit
.
Currently, the data saved in MergeRequestDiffCommit
is intentionally absent of parent_ids
and there is a project to remove more commit properties from this model in order to optimize storage space.
However there are times where the absent commit properties will be needed. An example of this is https://gitlab.com/gitlab-org/gitlab-ce/issues/58805 where parent_ids
on merge request commits were needed in order to filter out merge commits.
Over time, merge request commit data is expected to the least-commonly-needed properties removed. However there will likely be times when this data will be needed in certain situations.
In this conversation thread https://gitlab.com/gitlab-org/gitlab-ce/issues/58805#note_149567136 an idea was discussed to fetch the missing commit data from gitaly. An idea was floated that this could happen "automatically", i.e., to have a Commit
know it is incomplete and to "enrich" itself when that property is requested.
It was decided in https://gitlab.com/gitlab-org/gitlab-ce/issues/58805#note_150342230 to leave the "automatic" enrichment to later, however work was merged in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/26144 to have non-gitaly commits become "enriched" and become full gitaly commits when #enrich!
was called on the CommitCollection
. #enrich!
was then added to CommitCollection#without_merge_commits
, so that parts of the application that expected to filter merge commits would be guaranteed to work without bugs.
Proposal
This issue represents the task of making this enrichment "automatic".
When a Commit
which is not a #gitaly_commit?
is asked for a property that we do not store in MergeRequestDiffCommit
, the Commit
would call #enrich!
on the whole CommitCollection
that it belongs to (this means we batch request gitaly data for all commits within the collection, and avoid n+1 issues).
This would mean that future developers would not need to know to manually call #enrich!
themselves in order to avoid bugs.
Note that we should consider whether we can monitor or measure where automatic enrichment takes place within the app in order to track whether we have struck the balance between optimizating storage in MergeRequestDiffCommit
in https://gitlab.com/gitlab-org/gitlab-ce/issues/34744 vs increasing calls to gitaly.