Support ActivityPub for GitLab
# Gitlab/ActivityPub Design Documents by @oelmekki
The goal of those documents is to provide an implementation path for adding
fediverse capabilities to Gitlab.
This page will describe the conceptual and high level point of view, while
subpages will discuss implementation in more technical depth (as in, how to
implement this in the actual rails codebase of Gitlab).
* [What](#what)
* [The fediverse](#the-fediverse)
* [ActivityPub](#activitypub)
* [Why](#why)
* [How](#how)
## What
Feel free to jump to [the Why section](#why) if you already know what
ActivityPub and the fediverse are.
Among the push for [decentralization of the
web](https://en.wikipedia.org/wiki/Decentralized_web), several projects
tried different protocols with different ideals behind their reasoning
(some examples : [Secure
Scuttlebutt](https://en.wikipedia.org/wiki/Secure_Scuttlebutt) or ssb for short,
[Dat](https://en.wikipedia.org/wiki/Dat_%28software%29),
[IPFS](https://en.wikipedia.org/wiki/InterPlanetary_File_System),
[Solid](https://en.wikipedia.org/wiki/Solid_%28web_decentralization_project%29)).
But one gained traction recently : what is known as
[ActivityPub](https://en.wikipedia.org/wiki/ActivityPub), better known for
the colloquial [fediverse](https://en.wikipedia.org/wiki/Fediverse) built
on top of it, through applications like
[Mastodon](https://en.wikipedia.org/wiki/Mastodon_%28social_network%29)
(which could be described as some sort of decentralized Facebook) or
[Lemmy](https://en.wikipedia.org/wiki/Lemmy_%28software%29) (which could be
described as some sort of decentralized Reddit).
We think that ActivityPub has several advantages that makes it attractive
to implementers and could explain its current success:
* **It's built on top of HTTP**. You don't need to install new software or
to tinker with TCP/UDP to implement ActivityPub, if you have a webserver
or an application that provides an HTTP api (like a rails application),
you already have everything you need.
* **It's built on top of JSON**. All communications are basically JSON
objects, which webdevelopers are already used to. This makes adoption
really easy.
* **It's a w3c standard and already have multiple implementations**. Being
piloted by the w3c is a guarantee of stability and quality work. They
have profusely demonstrated in the past through their work on HTML, CSS
or other web standards that we can build on top of their work without
the fear of it becoming deprecated or irrelevant after a few years.
### The fediverse
The core ideas behind Mastodon and Lemmy is called the fediverse. Rather
than full decentralization, those applications rely on federation, in the
sense that there still are servers and clients (so it's not p2p like ssb,
Dat and IPFS), but there is a galaxy of servers chatting with each other
instead of having central servers controlled by a single entity (like
Facebook or Reddit).
The user signs up to one of those servers (called **instances**), and they
can then interact with users either on this instance, or on other ones.
From the perspective of the user, they access a global network, and not
only their instance. They see the articles posted on other instances, they
can comment on them, upvote them, etc. What happens behind the scene is
that their instance know where the user they reply to is hosted, and it
contacts that other instance to let them know there is a message for them -
somewhat similar to what SMTP is doing. Similarly, when an user subscribes
to any sort of feed, their instance informs the instance where the feed is
hosted of this subscription, and then that target instance will post back
messages when new activities are created (allowing for a push model rather
than a constant poll model like RSS). Of course, what was just described is
the happy path, there is moderation, validation and fault tolerance
happening all the way.
### ActivityPub
Behind the fediverse is the ActivityPub protocol. It's a simple HTTP api
attempting to be as general a social network implementation as possible,
while giving options to be extendable.
The basic idea is that an `actor` will send and receive `activities`, which
are structured JSON messages with well defined properties, but extensible
to cover any need. An actor is defined by four endpoints, which are
contacted with the `application/ld+json;
profile="https://www.w3.org/ns/activitystreams"` Accept header:
* `GET /inbox` : used by the actor to find new activities intended for them.
* `POST /inbox` : used by instances to push new activities intended for the
actor.
* `GET /outbox` : used by anyone to read the activities created by the
actor.
* `POST /outbox` : used by the actor to publish new activities.
Among those, Mastodon and Lemmy currently only use `POST
/inbox` and `GET /outbox`, which is the minimum needed to implement
federation : instances push new activities for the actor on the inbox, and
reading the outbox allow to read the feed of an actor.
Additionally, Mastodon and Lemmy implement a `GET /` endpoint (with the
mentioned Accept header) which responds with general information about the
actor, like name and url of inbox and outbox. This is not required by the
standard, but it makes discovery easier.
It's worth mentioning that while it's the main use case, an actor does not
necessarily map to a person. Anything can be an actor : a topic, a
subreddit, a group, an event, etc. It's easy to understand for Gitlab :
anything that have activities (in the sense of what Gitlab means by
"activity") can be an ActivityPub actor. So this includes projects, groups,
releases, etc. In those more abstract examples, an actor can be thought of
as an actionable feed.
And that's it. Sounds too simple to be true? That's because it is. :)
ActivityPub by itself does not cover everything that is needed to implement
the fediverse. Most notably, are left for the implementers to figure out:
* finding a way to deal with spam, currently covered by authorizing or
blocking other instances (colloquially referred to as "defederating")
* discovering new instances
* performing network wide searches
## Why
Why would a social media protocol be useful for Gitlab?
There already has been several very popular discussions around this (see
[here](https://gitlab.com/gitlab-org/gitlab/-/issues/21582),
[here](https://gitlab.com/gitlab-org/gitlab/-/issues/14116) and the epic
[here](https://gitlab.com/groups/gitlab-org/-/epics/260)). The gist of it
is: what people really want is to have one global "Gitlab network" to be
able to interact between various projects without having to register on
each of their hosts.
The ideal workflow for this would be:
* Alice registers to her favorite Gitlab instance, like
`gitlab.example.org`
* She looks for a project on a given topic, and sees Bob's project popping
up, despite Bob being on `gitlab.com`.
* She clicks the "fork" button, and the `gitlab.com/Bob/project.git` is
forked to `gitlab.example.org/Alice/project.git`
* She makes her edits, and opens a merge request, which appears in Bob's
project on `gitlab.com`.
* Alice and Bob discuss the merge request, each one from their own Gitlab
instance.
* Bob can send additional commits, which are picked up by Alice's instance
* When Bob accepts the merge request, his instance picks up the code from
Alice's instance.
In this process, ActivityPub would help in:
* letting Bob know a fork happened
* sending the MR to Bob
* allowing Alice and Bob to discuss the MR
* letting Alice know the code has been merged
It will _not_ help in (please open an issue if I'm wrong):
* implementing a network wide search
* implementing cross-instance forks (don't need it anyway, git is kind of
good at that :) )
Those will need to get specific implementations.
One may wonder : why use ActivityPub here rather than "just" implementing
cross-instance merge requests in a custom way?
There are two reasons for that:
* **Building on top of a standard makes it easy to reach beyond Gitlab**.
While the workflow presented above only mentions Gitlab, building on top
of a w3c standard means it will be easy for other forges to follow Gitlab
there, and build a massive fediverse of code sharing.
* **This is an opportunity to make Gitlab more social**. To prepare the
architecture for the workflow above, smaller steps can be taken, allowing
people to subscribe to activity feeds from their fediverse social
network. Basically, anything that has a RSS feed currently could become
an ActivityPub feed. This would mean that people on Mastodon could follow
their favorite developer, project or topic from Gitlab and see the news
in their feed on Mastodon, hopefully raising engagement with Gitlab.
## How
The idea of this implementation path is not to take the fastest route to
the feature with the most value added (cross-instance merge requests), but
to go on with the smallest useful step at each iteration, making sure each step
brings something actually useful immediately.
1. [implement ActivityPub for social following](./social-following.md).
After this, the fediverse can follow activities on Gitlab instances.
1. ActivityPub to subscribe to project releases
1. ActivityPub to subscribe to project creation in topics
1. ActivityPub to subscribe to project activities
1. ActivityPub to subscribe to group activities
1. ActivityPub to subscribe to user activities
1. **implement cross-instance search**. After this, it's possible to
discover projects on other instances.
1. **implement cross-instance fork**. After this, it's possible to fork a
project from an other instance.
1. **implement ActivityPub for cross instance discussions** . After this,
it's possible to discuss on issues and MR from an other instance.
1. in issues
1. in merge requests
1. **implement ActivityPub to submit cross-instance merge request**. After
this, it's possible to submit merge requests to other instances.
epic