checks: Fix combinatorial explosion in `#commits_for()`
What does this MR do?
The function #commits_for()
has been introduced via 156ce433
(checks: Implement infrastructure for bulk diff checks, 2021-07-29) in
order to allow bulk-loading of commits. What this function does is given
a set of new commits and a specific object ID, it will do a graph walk
of these new commits starting from this ID in order to extract only
those commits which have been newly introduced via this object ID.
As it turns out though, the implementation has a bug which causes combinatorial explosion: if a commit is reachable via multiple commits, then it will be walked and returned multiple times. This can happen if there are merge commits, where we'll now repeatedly walk all common ancestors of both commits. In case these again contain merge commits, the common ancestors again get walked multiple times. The result is that commits get walked exponentially many times. For the following graph with criss-cross merges, this causes us to return 768 commits instead of the expected 18:
o---o---o---o---o---o---o---o
/ \ / \ / \ / \ / \ / \ / \ / \
o X X X X X X X o
\ / \ / \ / \ / \ / \ / \ / \ /
o---o---o---o---o---o---o---o
Fix the bug by removing seen commit IDs from the hash tracking commits by their object ID. Like this, we'll not re-walk any commits which we have already walked before.
Part of #336992 (closed)
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.)