10.4 KB
Newer Older
#+TITLE: Spritely Golem: Secure, p2p distributable content for the fediverse
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

This is a demo for Golem, one of the [[][Spritely]] demos.
Each Spritely demo tries to demonstrate a key idea on how
to "level up" the fediverse.

The problems this demo is trying to address is:

 - Nodes go down, and their content tends to go with them.
   How can we have content that survives?
   Content which is distributable over a peer to peer network seems
   like it would help.
 - Except if an entire network is helping hold onto and distribute
   content, how do we keep private content private?
 - How to do this in a way that is compatible with the [[][ActivityPub]]

By encrypting the file and splitting it into chunks distributed
through the network and only sharing the decryption key with the
intended recipient, and by using a URI scheme that captures the
appropriate information, we can accomplish all the above.
Golem uses the [[][magenc]] extension of the [[][magnet URI scheme]] to
accomplish the above.

Why the name "Golem"?
In fantasy literature and folklore, a Golem is assembled from
inanimate parts, and only through the casting of magic words is
it brought to life.
Likewise, here encrypted chunks are distributed inanimately through
the network, and the magic words uttered are the decryption key,
known only to the intended recipients (and, well, anyone they choose
to pass them on to).

*NOTE:* This demo is not intended for production deployments.
The purpose of this demo is to explain its core ideas to federated
social web implementors.
As such, the demo takes many shortcuts for the sake of brevity.
It is intended to be simple enough to be read and understood in
a single evening.
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77

* How to install Golem

First you'll need [[][Racket]].
You'll have the option to install the minimal or full distribution of
Choose the full installation.

First do a git checkout of this git repository.
Then do: 

: raco pkg install

Okay you're ready to go!

* Running Golem

We're going to need two separate Golem servers running to test
federating with each other.
To do this, open two separate terminals and navigate both of them
to the Golem checkout directory.
Now let's start up each server.

In the first terminal:

: racket golem.rkt --port 8000 --other-stores "http://localhost:8001/read-only-cas" Alice

In the second terminal:

: racket golem.rkt --port 8001 --other-stores "http://localhost:8000/read-only-cas" Bob

In the first terminal, you should see a message like:

: Your Web application is running at http://localhost:8000.
: Stop this program at any time to terminate the Web Server.

Same in the second, but with port 8001.

Christopher Lemmer Webber's avatar
Christopher Lemmer Webber committed
Test it out by opening your browser and opening [[http://localhost:8000/]]
79 80
in your browser.
In the upper left hand side you should see "Alice's site".
Christopher Lemmer Webber's avatar
Christopher Lemmer Webber committed
Opening [[http://localhost:8001/]] should say the same, but with
82 83
"Bob's site".

84 85 86 87 88 89 90 91 92 93 94 95 96
What's with the =--other-stores= option?
If you'll notice, the two sites are pointing at each other's
read-only-cas endpoints.
This will be how they are able to find each others' content...
more on that later.

* Giving it a try

As said, this is a very very verrrrry paired down ActivityPub
Each server that's being run is single-user, and we haven't even
bothered requiring that you authenticate to be able to post content!

Christopher Lemmer Webber's avatar
Christopher Lemmer Webber committed
Returning to visiting [[http://localhost:8000/]] or [[http://localhost:8001/]]
98 99 100 101 102 103 104 105 106 107 108 109
in our browser.
What you should see is a form from which we can submit content
and a summary of the posts we most recently sent (our "outbox")
as well as the most recent posts we've received (our "inbox").

Let's try making a post from the form on [[http://localhost:8000/]].
Currently, we should see "Hey look... nothing!" in both the outbox
and inbox sections of the page, because we've neither sent or
received any content.

The *To:* field is who we want to send it to... well, this is
Alice's site, and Alice wants to talk to Bob, so let's put
http://localhost:8001/ in this field.[fn:wait-wheres-webfinger]
111 112 113 114 115 116 117 118 119 120 121 122
The box underneath it is the body of our post, so let's put
in a simple message, like "Hello, Bob!".
Now press the "Submit" button.

If everything went well underneath "Most recent post in your outbox"
the post "Hello, Bob!" (or whatever message it is that you sent).
But did Bob get it?
Navigate over to [[http://localhost:8001/]] and refresh the page in your
Yup, the post should be there in the inbox... looks like Bob got it!

123 124 125 126 127 128 129 130 131 132 133 134
[fn:wait-wheres-webfinger] Some users of the conventional fediverse
may be thinking, "Wait a minute!  I thought addressing in ActivityPub
used email-like addresses like [email protected] ... what's going
That style of addressing is called a [[][Webfinger]] based address, and
while it's possible to use in conjunction with ActivityPub, actual
ActivityPub addressing uses the [[][URIs]] of [[][actors]].
In this case, [[http://localhost:8001/]] /is/ Bob's actor URI.
Our server does an HTTP request asking for the activitystreams
representation of Bob at that address and gets back a JSON object
that points at Bob's inbox.
We can then use that to federate a message to Bob.

136 137 138 139 140 141
* What's going on?  

That's all great, but how is this different than any other ActivityPub
How do we even know that things were sent securely?

142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244
If we look at the *Id:* field, we'll see something like the following:

: magnet:?xt=urn:sha256:Cvy4hoVEsY7n3T2wf4306IhhBS1CV03pNuLtMOR73xc
:        &ek=_T8EGDBegDdmMdqRG4Lyd8zFto0cmck4FoaRzsXcM08
:        &es=aes-ctr

This is the "id" of the object, which is the address from which we can
retrieve the object.
In contemporary ActivityPub servers, this is generally an http(s) URI
The message that is delivered to an actor's inbox usually has this
"id" attached to it, so we know where it lives (and can verify its

But there is no requirement in ActivityPub that the id of an object
be an http(s) URI, only that it be a URI.
In http(s), content is generally "live"; when you request the object,
some specific server is responsible for handing it to you and is
the authority of what belongs there (which could always change).
However if that server goes down (or perhaps if the domain pointing to
it expires or gets transferred) you might not be able to retriev it
any more.
In other words, the http(s) scheme represents a kind of [[][URL]].

URLs have some advantages, but as it turns out, there are other kinds
of URI schemes out there.
One of these is called a [[][URN]], which is fairly well described by
its Wikipedia page:

  URNs were conceived as persistent, location-independent identifiers
  assigned within defined namespaces, typically by an authority
  responsible for the namespace, so that they are globally unique and
  persistent over long periods of time, even after the resource which
  they identify ceases to exist or becomes unavailable.

Alice could host a picture of a cat live at =https://catpics.example/pics/mycat.jpg=,
but that could always go down.
If the cat picture became very popular, Alice would be responsible
for paying for all that bandwidth herself.
But there's a category of URNs that are "content addressed"; in other
words, if the sha1 hash of the cat photo Alice wanted to share was
=dbe5b3e2aabde97aefdc5b605cacd0ce8210c203=, Alice could share the URN of
=urn:sha1:dbe5b3e2aabde97aefdc5b605cacd0ce8210c203= with Bob,
and Bob could ask his peers (which could include Alice) for a file that
matches that hash.
Once a Bob finishes downloading a file from that peer, Bob can verify
that the hash of the content matches.
This is a totally valid type of URI, even though it's not what many
users of the web are used to.
And it turns out, we can use these as the identifiers for objects in
ActivityPub, and then they can live anywhere.

But wait... that's not enough.
We want the network in general to be able to help distribute objects
to anyone who asks for them, and yet we also want to keep posts
private between their intended recipients.
We can encrypt the file with a symmetric key we share /only/ with
the intended recipients, break it apart into regularly sized chunks
so nobody can guess which file it is based on its filesize, and then
those encrypted chunks can be safely shared by the whole network...
but only the recipients of the key can unlock its content.
The [[][Magenc]] writeup explains how it does this by extending the
[[][magnet URI]] scheme, composing together both the content URI (or a
manifest chunk that points to the rest of the chunks) with the key
into a new magnet: uri.
(This idea isn't new; it's been done for quite a while by projects
like Tahoe-LAFS and Freenet.)

If we look again at the *Id:* header, now this starts to make a lot
more sense:

: magnet:?xt=urn:sha256:Cvy4hoVEsY7n3T2wf4306IhhBS1CV03pNuLtMOR73xc
:        &ek=_T8EGDBegDdmMdqRG4Lyd8zFto0cmck4FoaRzsXcM08
:        &es=aes-ctr

 - *xt* stands for "eXactTopic".  It's where our initial encrypted
   chunk is!  (Which might be the only chunk if it's very small.)
   Anyone in the peer to peer network can pass this around and help
   share it, but not everyone in the peer to peer network knows
   what it is (this is also helpful for those who want to generally
   distribute content on the network... it can reduce your liability
   for passing around content you don't know about, since you don't
   know what it is).
 - *ek* stands for the "EncryptionKey".  Since it's symmetrically
   encrypted, it's also the decryption key!  We can use this to
   decrypt the chunk above (as well as any other chunks it points
 - *es* is the encryption suite.  Different encryption suites are
   possible so we need to know which one.  In this case, it's

This is all very good and well, but what does it look like during

* Verify it yourself!

* Problems with this demo

 - urn:sha256 isn't actually specified yet, but it probably should be
 - At the time of writing, the magenc extensions to the magnet scheme
   aren't used by anything in production yet.