It's common to write commits that are found to be bad long after the merge:
- Flaky test due to random inputs, multi-threading or distribution.
- Changes that cause performance or convergence regression.
- etc.
We need a tool to automatically bisect to the culprit commit.
The tool should:
- binary search over the commits
- for each selected commits, evaluate if it's good or bad.
- decide the first commit causing the problem.
Notes:
It's common to merge a PR that contains several commits. Only the merge commit is tested to be good by the CI before merge. Hence, we only bisect the commits that merged to the mainline (e.g. develop) branch.
Advanced Feature:
The CI should continuously evaluate recent commits and record the history. This allows developers to quickly find out when does the problem start.