Create a changelog feature in GitLab and use it for GitLab Releases
As part of [the Q4 FY212
OKR](https://gitlab.com/gitlab-com/gl-infra/mstaff/-/issues/19), we will be
focusing on implementing changelog generation in GitLab itself. This epic will
track the work necessary as part of this OKR, as well as provide an overview for
what we have today and where we want to move to.
Related epic: https://gitlab.com/groups/gitlab-org/-/epics/4983
<details>
<summary>Table of contents</summary>
[[_TOC_]]
</details>
* :white_check_mark: Completed
* DRI: @yorickpeterse
* [Issue board](https://gitlab.com/gitlab-com/gl-infra/delivery/-/boards/2189805)
## The problem
Many projects need to record the changes made between releases, in a way more
detailed than a release announcement blog post. The approach taken by projects
varies, but tends to either be completely manual, or semi-automated.
As a release manager, generating a changelog involves multiple steps. In case of
GitLab this is automated using custom tooling, tooling others can't use. If done
manually, release managers may not have the information necessary to produce an
accurate changelog. Even if they do, this process is time consuming and prone to
error.
As a developer, adding changelogs involves a few steps that can be overlooked
easily. Different projects using different workflows also makes it more
difficult to contribute changes, especially for beginners.
As a user, the different approaches lead to different output formats for
changelogs. This requires that the user familiarises themselves with these
different formats. Some formats may not include all the information the user
wants. Users may wish to use the changelogs to:
* See if there are any new features worth trying out
* Find out if that one bug they have been dealing with for 6 months has been
fixed
* See how many performance improvements have been made that perhaps aren't
highlighted in the changelog
* Use the changelog to build a release post themselves, perhaps for some social
media platform
In this epic we propose building a solution into GitLab. This solution makes it
easier to:
1. Contribute changelog entries as a developer
1. Generate changelogs as a release manager
1. Consume changelogs as a user
By building this into GitLab, all users of GitLab can benefit from this
functionality. In addition, the Delivery team no longer needs to maintain its
own custom changelog generation code.
## Proposal
We will move to a setup where changelog entries are generated based on commit
titles. The output will remain a changelog file. Commits are excluded by
default, and can be included by adding a tag to the message. Commits can
further be classified as a feature, bug, etc; again by adding a tag to the
commit.
To make all this easier, we'll also extend GitLab to support the following:
* Editing of commit messages from a merge request
* Adding these tags straight from the UI
We'll need the help from Gitaly to add an API of sorts to update messages of
existing commits. We'll need help from frontend and UX, to help build the UI for
editing commit messages when viewing merge requests.
We likely also need to extend some of our review tooling and process, in order
to make this process as pain-free as we possibly can. For example, we could run
[Vale](https://github.com/errata-ai/vale) or
[gitlint](https://github.com/jorisroovers/gitlint) against commit messages to
help developers with the writing process.
## Technical details
To enrich changelog output, such as by marking a change as EE specific or
marking it as a "feature", we will use [Git
trailers](https://git-scm.com/docs/git-interpret-trailers). Git trailers are
built-in into Git, making it easier to interact with/manage this data as a
developer. Since these tags are usually placed at the end of a commit body, they
don't reduce the number of characters one can fit in a subject line (assuming
they follow the 50 character rule).
### Opt-in for changelogs
Commits are not included by default, as not every commit warrants a changelog
entry. To include a commit in the changelog, one would add the `Changelog: true`
tag to the commit message. This can be added manually, or using `git
interpret-trailers` (`git commit` doesn't support this at the moment per
https://gitlab.com/gitlab-org/git/-/issues/52).
### Commit metadata
To categorise commits as features, bugs, etc, one would add a `Type:` tag. The
exact values possible and the names to use in changelog files can be specified
in a configuration file.
As an example, this is what a feature commit may look like:
```gitcommit
Expose creation/update times for issue links
The issue links API now exposes the fields created_at and updated_at for
each issue link. This allows clients to determine when an issue link is
created or updated.
See https://gitlab.com/gitlab-org/gitlab/-/issues/283948 and
https://gitlab.com/gitlab-com/gl-infra/delivery/-/issues/1250 for more
information.
Changelog: true
Type: added
```
To mark a change as an EE-only change, one would add the `EE: true` tag.
Linking to merge requests would be done by adding the `Merge-request: X` tag,
with X being either the full URL of a merge request, its ID, or the short
reference (`gitlab-org/gitlab!48051` for example). Upon generating the changelog,
GitLab will resolve this to a full URL. If the tag is left out but the MR can
be derived from the commit, GitLab may include the merge request (this depends
on how expensive it is to get this information).
This metadata could also be used to automatically add labels to merge requests,
but this won't be part of our first iteration.
### API interface
The generation process is done in a synchronous API call. We won't be deferring
anything to Sidekiq for this initial iteration.
### Configuration
Configuring this process is done using a YAML configuration file, located in the
repository at `.gitlab/changelog.yml`. This file can specify data such as the
following:
* What `Type:` values can be used, and their human-readable names (used as
sections in the changelog)
* Where to store the generated changelog file (`CHANGELOG.md` by default)
* Whatever we need to mark changes as EE-only (not sure about this yet, this
will require some additional thinking)
In the future we may support additional options, such as changing the output
format.
## Success criteria
1. All GitLab projects released using Release Tools use the changelogs feature,
provided they need to produce a changelog.
1. The changelog feature is adopted by the community
The first one is obvious, as we want to use the feature as part of our release
process. The second one is a little more difficult to measure, and determining
the success here will largely depend on community feedback.
## Future plans
In the future we may add additional input sources, such as merge request titles
or changelog entry files. What inputs exactly we add will depend on feedback we
get from users of the changelog feature.
In addition, we may support different output formats, such as the GNU changelog
format.
## Other approaches
We considered other approaches, such as using changelog files or merge request
titles as input. These approaches either introduce considerable problems (e.g.
not supporting our security releases workflow), or don't end up improving the
developer experience enough to justify the effort (such as using changelog entry
files as input).
### Merge request titles as input
An alternative to the above proposal is to use merge request titles as input,
instead of commit titles.
When we want to generate a changelog, we provide the ref of the last release
tag, and the ref we will tag for the new release. GitLab takes this range of
commits, then determines what the merge requests are that those commits
originated from. Optionally, it reduces the list of merge requests to those
deployed to a certain environment (production in our case).
Using these merge requests, we use their titles to add changelog entries. Within
the changelog, entries are grouped based on the presence of certain merge
request labels.
Using this approach, we don't need to change the way we write commit messages;
instead we need to change how we write merge request titles.
This approach introduces a few challenges:
1. Security releases take place on a private mirror, thus the above process is
only aware of security merge requests. If we ever want to include regular
merge requests (which is rare, but has happened), they won't make it into the
changelog.
1. Each merge request maps to a single changelog entry. If an MR introduces
multiple commits that warrant their own changelog entries, the MR author has
no choice but to create one merge request for every such commit. This can
complicate both the developer and reviewer workflow.
1. We may not always be able to determine what the source merge request is of a
commit. This can happen if the commit SHA changes (e.g. after cherry picking
it), or if it's rebased/rewritten in some other way.
1. Not all projects use merge requests (as often as we do), meaning they likely
wouldn't be able to use this.
In addition to these challenges, using merge request titles wouldn't improve the
quality of our commit messages. This means that they don't become more useful
when debugging something.
Because of this, we believe that starting with commit titles as input allows us
to achieve the best results. Support for merge requests is something we could
add later if desired, building on the foundation necessary for using commit
titles as changelog input.
epic