Improve protected branches
<!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION --> *This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.* <!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION --> Sensitive information sometimes accidentally pushed to Git repositories. Although it is possible to safeguard against this in various ways it isn't possible to succeed every time. Sensitive information is a broad definition that could include trade secrets in the form of lab results from testing an experimentally drug, or personal information of a real person that was used to replicate and fix a bug. Unlike passwords, this information can't be rotated and needs to be permanently removed from the repository. There are tools for removing information from a Git repository like `git filter-branch` and https://rtyley.github.io/bfg-repo-cleaner/ but this is incomplete because GitLab uses `refs` to make sure that the commits in a merge request and diffs that have been commented on are not removed, so that future developers can read discussions about the previous code changes. ### Vision GitLab needs to provide a way to help users remove sensitive information from repositories. ### Scenarios 1. **Sensitive information pushed to feature branch, and is `HEAD` of branch** :see\_no\_evil: Simply pushing a commit can result in quite significant amounts of data being generated that may contain the sensitive data, including: refs, diffs, job traces, job artifacts, environments, containers etc. If the commit is the `HEAD` of the feature it is simplified because there shouldn't be any descendant commits, and we can search that database for entities that reference this commit. 1. **Sensitive information pushed to feature branch, but not `HEAD`** (might be rebased, very incomplete fix) :fire: :fire\_engine: At this point, there will be various refs that point to the commit, there could descendant commits, and we need to scan the repo to determine all the impacted commits. Using `commitGraph` should make this fast, but we need to consider large repositories. Once we find all the impacted commits and refs, we need to remove them and remove all the impacted data from the database etc. 1. **Sensitive information pushed to feature branch and merged** :boom: :fire: :8ball: The worst situation. There will likely be a very large amount data that is contaminated, requiring a very large number of commits to be removed an rewritten. Additionally large amounts of data would need to be removed. Very very bad. ### Links / references
epic