Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
    • Switch to GitLab Next
  • Sign in / Register
Meta
Meta
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 125
    • Issues 125
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge Requests 0
    • Merge Requests 0
  • Requirements
    • Requirements
    • List
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI / CD
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • MoodleNet
  • MetaMeta
  • Wiki
  • generic activitypub library

Last edited by Mayel de Borniol Oct 30, 2018
Page history

generic activitypub library

Alex's notes about creating a generic ActivityPub library in Elixir

  • Alex's notes about creating a generic ActivityPub library in Elixir
    • About
    • Pleroma
      • Endpoints
      • Mastodon clone
      • Coercion
      • Database
        • Transactions
        • Relations
          • GDPR
          • Queries
      • Lack of Background Job system
      • Conclusions
    • Future implementation
      • Coercion
        • Problem
        • Solution
      • Work with Links
        • Problem
        • Solution
      • Field types
        • Problem
        • Solution
      • Pattern matching
        • Problem
        • Solution
        • Prototype
      • Extension [New Types]
        • Problem
        • Solution
        • Prototype
      • Extension [New Fields]
        • Problem
        • Solution
        • Prototype
      • Database Schema
        • Drawback 1: Collection index size
        • Variant 1
        • Variant 2
        • Variant 3
      • Split Persistence layer and Data Structs
        • Problem
        • Solution

About

This document tries to gather the different ideas and opinions of the MoodleNet team to create a library to implement ActivityPub and ActivityStream 2.0 (AP in the future) in Elixir.

Pleroma

Pleroma is a Mastodon clone coded in Elixir. It implements some parts of the AP and it could be useful for the project.

Endpoints

There are approximately 12 endpoints in AP, some of them are optional. Pleroma implements only 4 of them:

  • get followers of a user
  • get following of a user
  • get outbox
  • post inbox

So, it implements the minimum federated server side. It does not implement the client side.

Mastodon clone

The code wasn't thought to be generic. It is normal, it is a Mastodon clone. So it only implements the following AP objects:

  • Note
  • Article
  • Video

It can also be noticed in some functions, where some AP related work but it also does specific Mastodon stuff.

Another important fact is that only 20% of the code is AP related.

Coercion

They don't do any type of coercion or abstraction around the AP data. This means when the server finds something it does not expect it just crashes and returns a 500 error to the request.

It also affects other parts of the code. Due to the lack of abstractions and difficulties to handle the AP flexibility, there are some limitations like only an owner actor per object. In this case, it does not crash, but a Pleroma server cannot faithfully replicate an activity produced by several actors. It just gets the first of it and ignores the rest of them.

Another symptom is the published field. In the same request can be set several times in several places, just checking if it is nil or not. This is happening because this field is very important in Mastodon. You cannot publish and order toots without a published date This is not AP protocol conforming, and more importantly, a bad practice or a shortcut to fix bugs.

Database

A good thing about the Pleroma database schema is that follows partially the AP vocabulary. This is our intention to have Object, Activities, etc. There is a "users" table with "avatar" field though, instead of "actors" and "icon".

Transactions

They are not using transactions but at the same time, they have very long database operations. It would not be strange inconsistent states because of this.

Relations

It does not use relations and it saves everything in a JSON data field. This works perfectly to save AP extension fields, but it has important drawbacks as well.

GDPR

We have to accomplish the GDPR laws. This means that we have to give the user the possibility to collect all their personal data, and also delete them if it is her desire. If we save everything in a JSON data find all the possibilities where a user is linked is going to be impossible.

With a relation, everything is easier. For example, to remove an actor we can just go to the row and converts it to tombstone. This automatically updates any previous or future reference to this actor.

To get all the information we will have to iterate to all the relations. It is hard but much easier and faster than the Pleroma alternative.

Queries

Some operations are extremely slow to make with the current database schema.

"Followers" and "followings" are made using an array field. This is really slow and it has to check all the time about the uniqueness. It needs to keep all the array in memory, not a big deal at the beginning of course, but could be problematic when an actor follows or is followed by thousands of actors. The exact same problem is found with "likes".

Update operations have to be done by hand. So, when an object is repeated in different activities, it has to go one by one updating it.

Lack of Background Job system

To make federation possible, the software needs to retries several times the connections to other servers. Currently, it only tries once, so if the first try fails, the message will never reach the federated server.

Conclusions

There are many things we can use Pleroma for our goals:

  • Signatures
  • Database ideas
  • Webfinger
  • Salmon
  • Unit tests
  • Testing federation with it

However, because our main goal is to make a generic library and Pleroma is a Mastodon clone, we believe it will be easier to extract Pleroma code and ideas when necessary, rather than modifying it to turn it into what we want.

Future implementation

Explaining some of the current thoughts about how the library should be done to solve most of the AP problems.

Coercion

We don't mean the conversion that libraries like Jason or Poison do to transform a JSON string into an Elixir map. They've already done a good job in this.

We mean the following step:

  • From The Elixir map which represents an AP entity (the result from JSON conversion).
  • To the data structure we can choose to work with.

Problem

AP is not uniform or consistent. AP is done to be flexible, however, this makes the code more difficult to write and to reason about. This is good to transmit information but it is difficult to work with it directly. Most of the fields can be a link, an object, nil, empty array or a combination of them in an array. The following examples are all valid, just we will focus on the object field:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Martin created an image",
  "type": "Create",
  "actor": "http://www.test.example/martin",
  "object": "http://example.org/foo.jpg"
}

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Martin created an image",
  "type": "Create",
  "actor": "http://www.test.example/martin",
  "object": {
      "type": "Link",
      "href": "http://example.org/foo.jpg"
  }
}

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Martin created an image",
  "type": "Create",
  "actor": "http://www.test.example/martin",
  "object": {
      "type": "Image",
      "id": "http://example.org/foo.jpg"
  }
}

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Martin created an image",
  "type": "Create",
  "actor": "http://www.test.example/martin",
  "object": [
      {
        "type": "Image",
        "id": "http://example.org/foo.jpg"
      },{
        "type": "Link",
        "href": "http://example.org/foobar.jpg"
      },
      "http://example.org/bar.jpg"
   ]
}

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Travel",
  "summary": "Martin went to work",
   "actor": "http://www.test.example/martin",
   "object": nil,
   "target": {
    "type": "Place",
    "name": "Work"
  }
}

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Travel",
  "summary": "Martin went to work",
   "actor": "http://www.test.example/martin",
   "target": {
    "type": "Place",
    "name": "Work"
  }
}

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Travel",
  "summary": "Martin went to work",
   "actor": "http://www.test.example/martin",
   "object": []
   "target": {
    "type": "Place",
    "name": "Work"
  }
}

Solution

To fix this problem, we propose the following:

  • All the known fields must be present
  • If the field was nil or was not present, we set it to [] (empty array)
  • If there is a single value, we wrap it in an array.
  • If any value was just a string we transform to a full JSON Link type.

So the previous examples would translate to (focus only on object field):

// From
{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Martin created an image",
  "type": "Create",
  "actor": "http://www.test.example/martin",
  "object":  "http://example.org/foo.jpg"
}

// To
{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Martin created an image",
  "type": "Create",
  "actor": "http://www.test.example/martin",
  "object": [{
      "type": "Link",
      "href": "http://example.org/foo.jpg"
  }]
}
// From
{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Martin created an image",
  "type": "Create",
  "actor": "http://www.test.example/martin",
  "object": {
      "type": "Link",
      "href": "http://example.org/foo.jpg"
  }
}

// To
{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Martin created an image",
  "type": "Create",
  "actor": "http://www.test.example/martin",
  "object": [{
      "type": "Link",
      "href": "http://example.org/foo.jpg"
  }]
}
// From
{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Martin created an image",
  "type": "Create",
  "actor": "http://www.test.example/martin",
  "object": {
      "type": "Image",
      "id": "http://example.org/foo.jpg"
  }
}

// To
{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Martin created an image",
  "type": "Create",
  "actor": "http://www.test.example/martin",
  "object": [{
      "type": "Image",
      "id": "http://example.org/foo.jpg"
  }]
}
// From
{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Martin created an image",
  "type": "Create",
  "actor": "http://www.test.example/martin",
  "object": [
      {
        "type": "Image",
        "id": "http://example.org/foo.jpg"
      },{
        "type": "Link",
        "href": "http://example.org/foobar.jpg"
      },
      "http://example.org/bar.jpg"
   ]
}

// To
{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Martin created an image",
  "type": "Create",
  "actor": "http://www.test.example/martin",
  "object": [
      {
        "type": "Image",
        "id": "http://example.org/foo.jpg"
      },{
        "type": "Link",
        "href": "http://example.org/foobar.jpg"
      },{
        "type": "Link",
        "href": "http://example.org/bar.jpg"
      },
   ]
}
// From
{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Travel",
  "summary": "Martin went to work",
   "actor": "http://www.test.example/martin",
   "object": nil,
   "target": {
    "type": "Place",
    "name": "Work"
  }
}

// To
{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Travel",
  "summary": "Martin went to work",
   "actor": "http://www.test.example/martin",
   "object": [],
   "target": {
    "type": "Place",
    "name": "Work"
  }
}
// From
{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Travel",
  "summary": "Martin went to work",
   "actor": "http://www.test.example/martin",
   "target": {
    "type": "Place",
    "name": "Work"
  }
}

// To
{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Travel",
  "summary": "Martin went to work",
   "actor": "http://www.test.example/martin",
   "object": [],
   "target": {
    "type": "Place",
    "name": "Work"
  }
}
// From
{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Travel",
  "summary": "Martin went to work",
   "actor": "http://www.test.example/martin",
   "object": []
   "target": {
    "type": "Place",
    "name": "Work"
  }
}

// To
{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Travel",
  "summary": "Martin went to work",
   "actor": "http://www.test.example/martin",
   "object": [],
   "target": {
    "type": "Place",
    "name": "Work"
  }
}

This way the coder always knows that has to work with arrays, and inside of this array, it always has a JSON with two types:

  • AP Object (or derived)
  • AP Link

This has been prototyped and it works!

Optionally, we can extend this behavior to the client replies, this way a client developed for our server can trust that we always send arrays.

Work with Links

NOT 100% sure about this

Problem

Link type is a complex beast. Most of the properties could be an object or a link. The link MUST have a href which is the target resource pointed to by a Link. This resource could be an AP Object or not:

// AP Core Example 4 https://www.w3.org/TR/activitystreams-core/#example-1
{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Martin created an image",
  "type": "Create",
  "actor": "http://www.test.example/martin",
  "object": "http://example.org/foo.jpg"
}

In the previous example, the actor property seems to be a Link which points to a Person (which is an AP Object). However, the object property can be a Link which points to an HTTP simple image or it can also be an Image object similar to:

// Example 6 https://www.w3.org/TR/activitystreams-core/#example-3
    ...,
    "object" : {
        "name": "My fluffy cat",
        "type": "Image",
        "id": "http://example.org/album/máiréad.jpg",
        "preview": {
          "type": "Link",
          "href": "http://example.org/album/máiréad.jpg",
          "mediaType": "image/jpeg"
        },
        "url": [
          {
            "type": "Link",
            "href": "http://example.org/album/máiréad.jpg",
            "mediaType": "image/jpeg"
          },
          {
            "type": "Link",
            "href": "http://example.org/album/máiréad.png",
            "mediaType": "image/png"
          }
        ]
      }
    ...

This distinction maybe is needed, not only for the generic implementation, but also for the target application (MoodleNet); ie: If you like an image, you have to know if it is an AP object so we increment the like counter or not.

Solution

To solve this kind of problem I propose to add a private field __target_object to the Link structure. If the Link pointes to a simple URL this field will be nil. If the Link points to an AP Object, this field will have the object.

This way, if we want to work only with objects to make some logic we could do something like this:

any_parsed_activity.object
|> ActivityPub.referenced_objects()
|> any_code_to_work_with_objects()

So the referenced_objects just iterate the list and:

  • In case the element is an AP object, keeps it
  • In case the element is a Link which pointes to an object (__target_object is not nil) returns the object itself.
  • In case the element is a Link which does not point to an object (__target_object is nil) filters it

Field types

Problem

If we know some fields MUST have an specific type we should parse them; ie:

{
    ....
    "startTime": "2014-12-31T23:00:00-08:00",
    "endTime": "2015-01-01T06:00:00-08:00"
}

We know startTime and endTime are strings that represent a date and a time. Working with strings for this is very annoying.

Solution

Using the famous Ecto library we can transform those values to Elixir DateTime type.

This has been prototyped and it is working!

Pattern matching

Elixir works hardly with Pattern Matching, especially to define functions.

Problem

f we use plain Map, which are the equivalent to Javascript JSON, the pattern matching is almost impossible. The reason is we don't have anything in the object that is mandatory. Even "id" and "type" fields are optional. And "type" field is an array, which makes even harder to "pattern match":

def my_function(%{type: "Note"}) do
  do_stuff()
end

The above function does not match with the following maps:

%{
    type: ["Note"]
}
%{
    type: ["ExtendedType", "Note"]
}

Because the "type" could be in any position of the array is impossible to pattern match.

Other times we want to pattern match by type, like Object, Actor or Activity. AP defines the following activities:

  • Accept
  • Add
  • Announce
  • Arrive
  • Block
  • Create
  • Delete
  • Dislike
  • Flag
  • Follow
  • Ignore
  • Invite
  • Join
  • Leave
  • Like
  • Listen
  • Move
  • Offer
  • Question
  • Reject
  • Read
  • Remove
  • TentativeReject
  • TentativeAccept
  • Travel
  • Undo
  • Update
  • View

It would be very nice to have something like:

def my_function(s, w) when ActivityPub.is_actor(s) and ActivityPub.is_object(w)

This way we defined a function which only works when the first argument is an Actor and the second any AP Object.

Solution

We are already using the Ecto library to parsing values. We can also create schemas for each type of Object. Most of the objects will have the same fields. To avoid too much code duplication we can just use some macros, something like:

# This is the internal code of the ActivityPub library

defmodule ActivityPub.Note do
  use ActivityPub.Object
end

defmodule ActivityPub.Image do
  use ActivityPub.Object
end

...

defmodule ActivityPub.Create do
  use ActivityPub.Activity
end

defmodule ActivityPub.Delete do
  use ActivityPub.Activity
end

...

And now, we can define functions using those structs:

# This is code can be used in both side, generic library or maybe target app

def parse_object(%Note{}=note) do
  do_note_stuff()
end

def parse_object(%Image{} = image) do
  do_image_stuff()
end

def parse_object(other_object) when ActivityPub.is_object(other_object) do
  do_generic_stuff()
end

def persist_activity(%Create{} = create) do
  do_create_stuff()
end

def persist_activity(%Delete{} = delete) do
  do_delete_stuff()
end

def persist_actiivty(activity) when ActivityPub.is_activity(activity) do
  do_generic_activity_stuff()
end

Prototype

This is kind of implemented in the prototype but it can be improved a lot.

Extension [New Types]

Problem

AP says that new types of object can be defined and the implementation should not stop parsing when this happens. This means if we receive:

{
    "type": "MoodleNetComment",
    ...
}

The app should work correctly.

Solution

To extend for a new type we should do two things. First, to define the module for the new type, the code will be similar to the ActivityPub generic library:

defmodule MoodleNet.Comment do
  use ActivityPub.Object
  extra_field :comment_type, :string
end

We also defined a new field comment_type for this new type.

The second thing we have to do is connect the new type string: "MoodleNetComment" with the module which handles it. This is done in the prototype using a function, but it also possible to do it with a simple map:

%{
    "Note" => ActivityPub.Note,
    "MoodleNetComment" => MoodleNet.Comment,
    ...
}

Another consideration, the type field could be an array so it will be useful to define a priority for each type, so with the following:

object = %{
  "type": ["Note", "MoodleNet"],
}

priorities = %{
    "Note" => {ActivityPub.Note, 1},
    "MoodleNetComment" => {MoodleNet.Comment, 2},
    ...
}

MoodleNet.Comment will be used because has a higher priority: 2

Lastly, if any type is implemented we will use just ActivityPub.Object.

Prototype

Extension [New Fields]

Any type defined by AP can have extra fields and the implementation should accept them.

Problem

Ecto schemas don't allow dynamic fields. So if we receive more fields in the input will be ignored.

Solution

When we parse any input, all the fields we don't know in the current object type will be saved in a private field called: __extra_fields. This field will be known by Ecto and the type will be just a map. We can add as many fields as we need inside.

However, to access any of those extra fields is going to be a strange code:

any_object.__extra_fields["extra_field"]

To make it easier for the developer to access this field we can implement the Elixir Access protocol. This allows us to access extra field the following ways:

# A regular field like `id`
any_object.id
any_object["id"]
any_object[:id]

# An extra field
any_object["extra_field"]
any_object[:extra_field]

We lose the direct access (with the dot) but we think it is good enough!

Prototype

This is already done in the prototype.

Database Schema

https://www.dbdesigner.net/designer/schema/208495

The objects table stores the AP Objects. It has all the fields that AP Object Vocabulary defines. It is important to remember that all AP entities, but Links, are Objects.

The __local_id is an implementation detail. It is an integer. It's useful because we avoid using the long AP ID which are strings. Indexing and joining integers is much better than using strings. A __local_id represents to one, and only one, AP ID. The viceversa is also true.

The __local boolean indicates if the object owns to the current server.

There is an actors table with the fields defined in the AP Actor Vocabulary. If an AP Object is also an AP Actor, the Actor information is stored here. Both tables are related using __local_id. The __local_id in this table is the primary key, but also, an external key referencing the objects table.

Same is happening in activities table. If an Object is also an Activity a row should be created in this table. They are related using __local_id.

Similarly, the collections table is related to the objects table using __local_id. So if an AP Object is also a collection, a row in this table should be created sharing the same __local_id. __ordered boolean field indicates if it is an AP OrderedCollection or an AP Collection.

The collection_items table relates a Collection with the items that contains. The collection_id is an external key referencing to a Collection. The object_id is an external key referencing to an Object. If an Object owns to a Collection, a row with both ids should be created here. This table also has an autoincrement id, this allows us to order the results in the case the Collection is ordered.

The follows table relates a following actor with a followed actor. It also has an autoincrement id, because the entries in this AP Collection MUST be ordered.

The likes table saves the like actions of the actors. It relates the liker (an Actor) with the liked object. Autoincrement id to be ordered.

The shares table is the same but for sharing (retweets) objects.

The inbox and outbox are ordered collection too. They are defined by AP:

The outbox stream contains activities the user has published The inbox stream contains all activities received by the actor

So it relates those activities with the actor owner of the inbox or outbox. They are ordenable by the field id.

The blocked table is used when an user does not want to receive information from an user. So before insert any activity in an inbox actor, this table should be consulted.

The rest of the tables are to handle the "many to many" relations. All (or almost all) the relations in AP are "many to many", ie:

An activity can have more than one object:

"Doug creates the note A and the note B"

Activity => creates Objects => [Note A, Note B]

An object can be related to many activities:

"Doug creates the Note A" "Doug updates the Note A" "Doug likes the Note A"

Activity => [creates, updates, likes] Object => Note A

The relations can be also Links. Those links can point to any URL, including to an AP Object ID. So the relations can be a Link or an Object, for this reason the relation tables have more fields than usual, ie:

activities_objects relates the Activities with the Objects. It has the activity_id and the object_id fields, but also it has all the Link fields:

  • href
  • rel
  • media_type
  • name
  • hreflang
  • height
  • width
  • preview

When the relation is just a Link, the Link fields are filled. When the relation is an Object, the object_id is filled. When the relation is a Link that points to an Object, all the fields are filled.

For each relation that AP defines a table is created with the previous fields:

  • activities_actors
  • activities_objects
  • activities_origins
  • activities_targets
  • activities_results
  • activities_instruments
  • objects_attachments
  • objects_attributed_tos
  • objects_generators
  • objects_icons
  • objects_images
  • objects_replies
  • objects_tags
  • objects_urls
  • objects_previews
  • relationships_objects
  • relationships_subjects

Drawback 1: Collection index size

Disclaimer: It is a future problem

For each collection table, this means:

  • follows
  • blocks
  • likes
  • shares
  • inbox_items
  • outbox_items
  • collection_items

We need three indexes to make the queries 100% efficients. For example, the likes table can be used for two things:

  • Give me the likes done by an actor
  • Give me the likes that an object received

And of course, this is needed ordered by creation time, AKA, when the like happened. So to get the likes done by an actor ordered we need the index: [actor_id, id] This will make our query efficient.

In the same way, to get the likes that an object received we need the index: [object_id, id]

Lastly, we need another index: [actor_id, object_id] or [object_id, actor_id] This index has to be unique and it is used to ensure we don't insert twice the same like.

We can save the primary key index id, because we don't need it. Maybe it is a good idea to change the name from id to order, and make it a simple autoincrement interger field. It will work exactly the same and it will save memory in the database server.

So we need 3 composite indexes for each table, this means 7 * 3 = 21 composite indexes. (Maybe we can save some of them if they don't need ordered, but I think it's better to make all of them ordered)

The indexes need to fit in memory to be efficient and useful. How much memory this will be?

With a 1M rows example, knowing that each id is 8 bytes, each index will be:

1M * (8 + 8) = 16MB

So a table has 3:

16MB * 3 = 48MB

So for a 10M, 100M and 1T rows we will have:

10M => 480MB
100M => 4.8GB
1T => 48GB

We can also try, if this became a real problem in the future, to make just 3 simple indexes for each column and let postgres doing the union and ordering between the partial queries. Maybe this works good enough. So the size will be just the half size:

1M * 8 * 3 = 24MB

Variant 1

One possible variant is to store the Actor, Activity and Collection information inside of a JSON column in the objects table. This will allow us to load the full object without any join. The bad part is that the relations would not be so expecific, ie:

collection_items table joins a collection with an owned object. We know it is a collection because we are using the foreign key of the collections table. With this variant, we remove this collections table, so the collection_items will be just a relation between objects. This implies the database cannot avoid something like: "A Note owns a collection item Actor" which is clearly wrong. A regular note should not be a collection.

This is just an example, it also happens in:

  • inbox_items
  • outbox_items
  • likes
  • follow
  • blocks
  • share
  • collection_items
  • activities_actors
  • activities_objects
  • activities_origins
  • activities_targets
  • activities_results

where there is specialized relation happens.

Variant 2

There are many relations in an object, even more in a activity, like 14. Usually, most of them will be empty, however will need to make a join or a query for each one to check it out.

To avoid those queries we can have a counter for each relation. This means if an activity has 3 actors, the activities actor_count new field will be 3. The same activity does not have any origin or target, the origin_counter and target_counter will be 0.

So if we load an activity and later we want to load their relations, we only have to load the relations whose counter are higher than 0. This will save several queries in this process.

The bad side of this design is:

  • We have to add a lot of triggers to update the counter automatically. This makes everything more complex, even more, because the trigger code resides in the database level, so it is more difficult to change (and to know that the code actually exists).
  • The insertions and deletions will be slower because the counter updates.

Variant 3

We can remove a lot of tables, that are in fact, just special collections:

  • inbox_items
  • outbox_items
  • blocks
  • follows
  • likes
  • shares

We can considerate them just regular collections and use collection_items to save them. The good parts about this are:

  • the code will be more generic a less complex
  • the fields audience, to, bto, cc, bcc can be just relations to objects.

The bad part is that the collection_items will be even bigger! In fact it could be so huge, that if have to bet, this will be the first big problem in the database when we scale.

Split Persistence layer and Data Structs

Probably, the most common pattern to work with databases is Object Relation Manager (ORM), where a table is represented by class, and an object of this class represents a row of this table. Then, you just work with those classes and instances. So if you have a users table you will have an User class. If the table users has a name field, we can use it like this:

user = User.new()
user.name = "new name"
user.save

In this example you're working straight with persistence layer. We propose split this two different concerns.

Problem

Our database schema is very complex to support all the flexibility of AP. Working with those tables could be very complex and not practical at all.

Solution

We have different Elixir modules to persist and other ones to represent the AP entities. So when we receive the following JSON:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Martin added an article to his blog",
  "type": "Add",
  "published": "2015-02-10T15:04:55Z",
  "actor": {
   "type": "Person",
   "id": "http://www.test.example/martin",
   "name": "Martin Smith",
   "url": "http://example.org/martin",
   "image": {
     "type": "Link",
     "href": "http://example.org/martin/image.jpg",
     "mediaType": "image/jpeg"
   }
  },
  "object" : {
   "id": "http://www.test.example/blog/abc123/xyz",
   "type": "Article",
   "url": "http://example.org/blog/2011/02/entry",
   "name": "Why I love Activity Streams"
  },
  "target" : {
   "id": "http://example.org/blog/",
   "type": "OrderedCollection",
   "name": "Martin's Blog"
  }
}

And we parsed with our library:

activity = ActivityPub.parse(json)
%ActivityPub.Entity{
  "@context": %ActivityPub.Context{
    "@vocab": ["https://www.w3.org/ns/activitystreams"],
    "@language": "en",
  },
  summary: %ActivityPub.I18nString{
    und: "Martin added an article to his blog",
  },
  type: ["Add"],
  published: %DateTime{"2015-02-10T15:04:55Z"},
  actor: [%ActivityPub.Entity{
    type: ["Person"],
    id: "http://www.test.example/martin",
    name: %ActivityPub.I18nString{
      und: "Martin Smith",
    },
    url: %ActivityPub.Link{
      href: "http://example.org/martin",
      __target_object: %ActivityPub.Entity{
        type: ["Profile"],
        name: %ActivityPub.I18nString{und: "Martins bio"},
        summary: %ActivityPub.I18nString{und: "Martins Hacker"}
      },
      rel: nil,
      media_type: nil,
      # rest fields to nil or []
    }
    image: %ActivityPub.Link{
      type: ["Link"],
      href: "http://example.org/martin/image.jpg",
      mediaType: "image/jpeg"
      # rest fields to nil or []
    }
  }],
  object: [%ActivityPub.Entity{
    id: "http://www.test.example/blog/abc123/xyz",
    type: ["Article"],
    url: %ActivityPub.Link{
      "http://example.org/blog/2011/02/entry",
    },
    name: %ActivityPub.I18nString{
      und: "Why I love Activity Streams"
    },
    # rest fields to nil or []
  }],
  target: [%ActivityPub.Entity{
    id: "http://example.org/blog/",
    type: ["OrderedCollection"],
    name: %ActivityPub.I18nString{
      und: "Martin's Blog"
    },
    # rest fields to nil or []
  }],
  # rest fields to nil or []
}

Notice how the library detects that the link to ""http://example.org/martin" is in fact an AP object, making easier the coder's life. And when it is needed is persisted in the database:

ActivityPub.persist(activity)
# or maybe a changeset
Ecto.Changeset.change(activity, name: "A real name")
|> ActivityPub.persist()

The developer does not interact with persistence layer. It does not it is saved in PostgreSQL or how many tables we have.

With our current schema it will be save like:

objects

local_id type ap_id summary name
1 ["Add"] nil `{"und": "Martin added an article to his blog"} null
2 ["Person"] "http://www.test.example/martin" null {"und": "Martin Smith"}`
3 ["Article"] "http://www.test.example/blog/abc123/xyz" null {"und": "Why I love ActivityStream"}
4 ["OrderedCollection"] "http://example.org/blog/" null {"und": "Martin's Blog"}
5 ["Profile"] "http://www.example.com/martin" {"und": "Martins bio"} {"und": "Martin Hacker

activities

local_id
1

actors

local_id inbox outbox
2 "http://www.test.example/martin/inbox" "http://www.test.example/martin/outbox"

collections

local_id total_items ordered
4 1 true

collection_items

id collection_id object_id
143 4 3

activities_objects

activity_id object_id href media_type
1 3 null null

activities_targets

activity_id object_id href media_type
1 4 null null

activities_actors

activity_id object_id href media_type
1 2 nil nil

object_images

subject_id object_id href media_type
2 null "http://example.org/martin/image.jpg" "image/jpeg"

object_urls

subject_id object_id href media_type
2 5 null null
3 null "http://example.org/blog/2011/02/entry" null
Clone repository
  • Federation testing plan
  • HQ user testing plan
  • How to file bugs
  • List of intentions of MoodleNet users (verb based)
  • Localisation
  • OER metadata
  • Policies by user "rank"
  • Taxonomies and common metadata values
  • activitypub and activitystreams
  • apis
  • backends and controllers
  • databases and models
  • discussions
  • elixir app structure
  • front ends
View All Pages