Mewp's federation/auth/protocol/architecture proposal

I spent some time thinking about how to extend the current protocol. I believe I have found an elegant solution.

Status quo

Right now, if you want to connect to another instance, you have to go to its website and connect to it manually. The current plan is to have a common front that will connect to many backends.

There is no concrete proposal for auth, although we are planning on using indieauth.

There is no proposal for protocol extensibility, or any other kind of extensibility. This is an attempt at one.

The proposed architecture

     Client
        ↓
┌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┐
╎ Home instance ╎
╎  ┌─────────┐  ╎
╎  │  Front  │  ╎
╎  └────┬────┘  ╎
╎       ▼       ╎
╎  ┌─────────┐  ╎
╎  │  Pusher │  ╎
╎  └────┬────┘  ╎
└╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┘
        ▼
   ┌─────────┐
   │ Backend │ 
   └─────────┘

My proposal is that the client connects only to the front, which talks only with its local pusher (aside from external services, e.g. jitsi). This pusher takes care of talking with backends on various instances.

An extensible protocol

The current protocol is basically:

message PusherToServerMessage {
    oneof { /* ... */ }
}

message ServerToPusherMessage {
    oneof { /* ... */ }
}

service Backend {
    rpc joinRoom(stream PusherToServerMessage) returns (stream ServerToPusherMessage);
}

I propose to replace this with:

message MessageId {
    int32 id = 1;
    int32 name = 2;
}

message Metadata {
    repeated string requiredExtensions = 2;
    repeated string optionalExtensions = 3;
}

message ClientHello {
    repeated string supportedExtentions = 1;
}

message ServerHello {
    repeated string extensions = 1;
    repeated MessageId message_ids = 2;
}

message Any {
    int32 id = 1;
    bytes message = 2;
}

service Backend {
    rpc getMetadata() returns Metadata;
    rpc communicate(stream Any) returns (stream Any);
}

Now, the reason for exchanging Any messages is that each backend could have a different set of them. Because we do not want to centrally assing message numbers, we have to have another way of making sure a pusher can talk to a different backend. There are two ways of doing that in protobuf:

Use Google's Any type, that addresses messages with strings instead of numbers
Exchange a message name <=> message id map at the beginning, then use the ids

Both options require dynamic typing of messages, which is rather unfortunate, but necessary. The second method, though, is more efficient. Of course, the message names could easily conflict as well. This is partly why I propose we also have a list of extensions.

Extensions

From the point of view of the protocol, an extension is an opaque string. They serve a dual purpose:

Both sides know how to talk to each other, i.e. what changes relative to the default behavior should they exhibit.
The client (pusher) knows that if the server advertises extension X, and message Y, and X uses Y as a message name, that it can understand the message.

The extension set is negotiated as follows:

In the ClientHello message, the pusher sends all extension names that it supports.
From these, the servers picks what extensions it wants to use to talk to the pusher. This list MUST be a subset of extensions reported by the metadata endpoint. 2a. This MAY be a superset of the pusher's list. In this case, the pusher should assume that it's missing a required extension and SHOULD NOT proceed further. The decision rests upon thepusher because "cheating" the backend by sending it more extension names is trivial, and doesn't help anyone. This way the backend at least knows that an incompatible pusher tried to reach it and can notify the administrator, so they can know that something is possibly broken.
The backend sends a ServerHello message indicating the list of extensions what will describe the protocol.
The pusher reads the extension list, and MUST use exactly the extensions reported by the backend.

Both the ClientHello and ServerHello messages MAY be extended by the extensions. I don't think this will cause many issues—first, not many extensions will need to do that, and second in case there is a field number collision, either a pusher and a backend have to implement two colliding extentions, and they can reach out to their developers to create a new, non-colliding version.

That being said, we should probably make an opt-in extension registry anyway to avoid these situations more easily.

In order to serialize/deserialize the Hello messages as Any, there are two alternatives:

Assign them nubmers 1 and 2 respectively, and forbid extensions being mapped to these numbers.
Ignore the numbers for the first received/sent message.

I am slightly in favor of the latter, but I don't have strong opinion either way.

Some examples:

Example 1

The pusher advertises support for an extension net.fediventure.serverSideMaps (the naming convention is up for discussion, this proposal leaves it undefined). The backend responds, saying that it's going to use it (i.e. returns ["net.fediventure.serverSideMaps"] as the extension list). It defines a message mapping [ ("MapData", 2137) ]. Then it proceeds to send a MapData message (with the id 2137) to the pusher. The pusher translates it to a format understood by the client, then sends it there.

Example 2

The pusher advertises support for an extension net.fediventure.serverSideMaps. The backend responds with an empty extension list and an empty mapping. The pusher has to decide what to do now—if it can proceed without the extension, it SHOULD do so.

Example 3

The pusher advertises support for an extension net.fediventure.serverSideMaps. The backend responds, saying that it requires re.workadventu.serverSideMaps. The pusher SHOULD NOT try to communicate further with the backend, as some of the messages might not be understood correctly. If it decides to proceed anyway, it MUST treat messages not belonging to any supported extension as unknown ones, even if their names match.

Example 4

The pusher advertises support for an extension frequentPing. The backend responds, saying that it's going to use frequentPing. It does not define any messages, and sends an empty mapping. The pusher now sends heartbeat messages more frequently (or at all, or whatever, the point is—it changes behavior).

Pros of that solution:

The frontend can have a static protocol definition, making it simpler
The frontend can evolve its protocol independently of the federation
The frontend has to connect only to one host, bypassing CORS issues
It allows for uncoordinated protocol evolution.
It allows easily supporting multiple pusher implementations.
This protocol, aside from changing message handshake and parsing code, doesn't require any other changes to the codebase.

Cons of that solution:

The pusher is more complex (but it's a better place for the complexity imo)
Extensibility necessarily introduces more fragmentation.
This doesn't tackle frontend extensibility at all (but I'm not sure that's a realistically solvable problem).
This protocol is, technically, backwards incompatible with the current one (but it's easy to detect that a backend doesn't support getMetadata, and fall back)

Message maps

Some extensions will define their own message types, which have to be assigned a number in Any. In order to avoid a need for a central number registry, we opted for the server to send message mappings to the client.

The pusher simply needs to keep a map of id <=> message type for each backend, and use that to send and receive messages.

Authentication

One thing deserving a particular attention is authentication. This proposal doesn't define any client authentication mechanism, but makes it an irrelevant implementation detail.

In my proposal, the instances authenticate each other, and if a client comes from instance A to instance B, the latter simply checks if the connection from A's pusher really comes from A, then believes it about who its users are.

This approach is similar to what matrix, diaspora, and other federated systems already do. We should probably base their authentication mechanism off of matrix. Authentication mechanisms will be implementes as extensions. One simple way would be, let's call it domainAuth. It would extend the ClientHello message adding a string originDomain field with the domain of the pusher. The server, seeing that this extension is supported by the client, would look for this field, check if the domain's A/AAAA record matches the originating ip, and if so, return a ServerHello stating, among other things, that is has chosen to work with this extension.

Since authentication mechanisms are extensions, the client might support multiple ones, and the server will pick which one satisfies it. The server SHOULD require at most one valid authenticaiton mechanism, e.g. is domainAuth fails, but another mechanism succeeds, this should be enough.

Errors

Let there be an extension errorMessage defining the following message:

message ErrorMessage {
    string message;
}

Upon receiving this message, the reciving side SHALL relay this to an appropriate place (frotend in the case of a pusher, a logfile in the case of a backend) and terminate the connection.

The metadata endpoint

This proposal defines a metadata endpoint for easy inspection of extensions that the backend claims to support. Although not strictly necessary for anything else, I think it will make a lot of things easier.