One time cleanup of stale MRs and branches
Let's cleanup stale branches that are not associated to an open and recently active MR. These take up space for the Review App preview and increase the time and space required to clone the repo.
Pulled from Slack thread here: https://gitlab.slack.com/archives/C9X79MNJ3/p1570941006015600
every one of those branches has a review app, which takes around 1GB of storage
Additionally we could look to schedule a pipeline which cleans up stale review apps similar to the GitLab repo if there is a lot of spend on the stale Review App resources (https://gitlab.com/gitlab-org/gitlab/-/jobs/319843116)
Notes
- Definition of a stale MR: in
state: opened
(not merged or closed) MR with no activity (updated_at
) in last 30 days- Currently ~520 MRs were updated more recently than 30 days (90 days would be ~840)
- Deletions are preceded with a warning (7 day warning)
- Closing the MR should kill any review that might be active
- Consider putting a link to a snapshot of the repo
and a diff of the branch before deleting it - definition of a stale branch: The GitLab default definition of a stale branch is 3 months
- Send notifications before closing/deleting branches/MRs.
Plan
Step 1 (COMPLETE): Make a pre-cleanup backup archive of the repo
- Store a complete archive of the repo (including all branches) a separate public and permanent project:
gitlab-com/www-gitlab-com-archive-20200410-pre-cleanup
- Make a final commit to
master
which deletes everything in the repo and replaces it with a single README.md file explaining the purpose of the archive, and directing people where to ask questions (since archiving also prevents any issues from being created). - It was also suggested to use https://archive.softwareheritage.org/, but an archive project should be sufficient.
- Make a final commit to
- Include a link to it in all subsequent communications (comments, emails, etc.) letting people know that if they want to get back a deleted branch, they can clone the archived project, and get their branch back from it. (optional: include instructions on setting the remote and re-pushing it)
- Deal with all the responses to the issues. Especially make sure no external contributions are prematurely closed.
Step 2 (COMPLETE): Close stale MR Branches and close MRs with external/missing branches
MR: !45261 (merged) MR: !46685 (merged)
The MR closing and deletion of their associated branches will be done as a separate first step. This will make the process simpler, because 1) we will have a smaller set of branch deletions to deal with, and 2) People will not have to let us know twice if they want to keep something (i.e. they won't have to tell us not to close the MR AND not to delete the associated branch).
Process:
- Run a script to:
- Label all stale MRs with a "stale" label:
- title: stale - to be closed
- description: "Used to identify stale Merge Requests which will soon be automatically closed"
- color: Pure Red
- Make a comment on the MR saying the MR will be closed and the associated branch deleted within 7 days unless the "stale" label is removed.
- Send a slack notification to the
#handbook
,#website
, and#whats-happening-at-gitlab
channels informing people of the plan and directions to remove the label if they don't want the MR closed and branch deleted. - Wait 7 days
- Run a script to do the following for all open MRs which still have the "stale" label:
- Delete the branches which are in the
www-gitlab-com
repo. This will cause their MRs to automatically be closed. - Directly close any which have branches on forks (non
www-gitlab-com
repo). These are external contributions, and their branches cannot (and should not) be deleted. - Directly close any for which the associated branch does not exist anymore (this is possible).
- Delete the branches which are in the
NOTE: If someone tries to re-push a deleted branch with the same name, it will NOT automatically re-associate with the MR. It puts the MR in a buggy state: It will show the restored branch as non-clickable (normal for deleted), UNTIL you edit the MR (even a no-op edit). Then, it will show the link to the branch, BUT the restored branch doesn't know it's associated with the MR - it will have a "create new MR" link instead of linking to the original one.
Step 3 (COMPLETE): Delete merged branches
MR: !47320 (merged)
Process:
- Run a script to find and delete all merged branches
Step 4 (COMPLETE): Delete remaining stale branches
MR: !48078 (merged)
At this point, all stale branches should have commits. But they may still not be deletable, if they are associated with an open MR. So, we need to identify only stale branches which are 1) associated with closed or merged MRs, OR 2) not associated with any MR.
Process:
Part 1:
- Run a script to identify all branches which are:
- Older than 90 days, AND
- Associated with a closed or merged MR OR not associated with any MR
- For each of these branches, obtain the:
- branch name
- last commit author name
- last commit author email
- last commit author date
- last commit committer name
- last commit committer email
- last commit committer date
- last commit message
- URL to branch
- URL to MR (if branch is associated with an MR) - NOTE: this could contain incorrect links if a deleted MR branch was subsequently created with the same name. There's at least one instance of this
- Create a spreadsheet with all of the branch information, and a column for ACTION containing "DELETE", sorted by descending commit date
- Send a mass email~~/slack~~ telling people to look at the spreadsheet, search for their name or email, and change the ACTION to "KEEP" for any branches they want to keep.
(Optional) email all authors and committers with a*@gitlab.com
email address.- Run the script again with the flag to do the actual deletion, and the exclude-list of any branches people wanted to keep.
Part 2:
- Wait 7 days
(provide a final reminder 3 days before) - Export a list of all branch names which have
ACTION=KEEP
- Run the script to delete the branches excluding the ones with
ACTION=KEEP
Clean up Review Apps
TODO: Come up with a plan - see comments below for suggestions.
History
- 2020-04-01: Sent initial slack notification: https://gitlab.slack.com/archives/C0259241C/p1585735985385900
- 2020-04-11: Ran MR cleanup script, sent second Slack notification in
#handbook
, crossposted in#website
, and#whats-happening-at-gitlab
- 2020-04-20: Ran MR close / branch deletion script.
- 2020-04-21: Ran script to delete merged branches (81 deleted)
Dev Notes
- The Quality group might have some existing tools to automate some of this - asked on slack.
- The GraphQL API doesn't appear to provide a way to retrieve multiple merge requests unless you provide a list of
iid
. See slack thread, gitlab-org/gitlab#34527 (closed), and gitlab-org/gitlab#213032 (closed) for details - Use CURL to check current ratelimit quotas:
curl -i "https://gitlab.com/api/v4/version?private_token=$PRIVATE_TOKEN" | grep ratelimit
(see https://docs.gitlab.com/ee/user/gitlab_com/index.html#haproxy-api-throttle)