README.org 20 KB
Newer Older
1
#+TITLE: Spritely Golem: Secure, p2p distributable content for the fediverse
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

This is a demo for Golem, one of the [[https://gitlab.com/spritely/][Spritely]] demos.
Each Spritely demo tries to demonstrate a key idea on how
to "level up" the fediverse.

The problems this demo is trying to address is:

 - Nodes go down, and their content tends to go with them.
   How can we have content that survives?
   Content which is distributable over a peer to peer network seems
   like it would help.
 - Except if an entire network is helping hold onto and distribute
   content, how do we keep private content private?
 - How to do this in a way that is compatible with the [[https://www.w3.org/TR/activitypub/][ActivityPub]]
   specification?

By encrypting the file and splitting it into chunks distributed
through the network and only sharing the decryption key with the
intended recipient, and by using a URI scheme that captures the
appropriate information, we can accomplish all the above.
Golem uses the [[https://github.com/WebOfTrustInfo/rwot7/blob/master/topics-and-advance-readings/magenc.md][magenc]] extension of the [[https://en.wikipedia.org/wiki/Magnet_uri][magnet URI scheme]] to
accomplish the above.

Why the name "Golem"?
26 27 28 29 30
In folklore and fantasy literature (the name here can apply to either
but borrows more from the fantasy literature tradition, but the idea
originates in Jewish folklore), a Golem is assembled from inanimate
parts, and only through the casting of magic words is it brought to
life.
31 32 33 34 35 36 37 38 39 40 41
Likewise, here encrypted chunks are distributed inanimately through
the network, and the magic words uttered are the decryption key,
known only to the intended recipients (and, well, anyone they choose
to pass them on to).

*NOTE:* This demo is not intended for production deployments.
The purpose of this demo is to explain its core ideas to federated
social web implementors.
As such, the demo takes many shortcuts for the sake of brevity.
It is intended to be simple enough to be read and understood in
a single evening.
42
(The [[file:./golem.rkt][corresponding demo code]] is also meant to be easy to follow, and
Christopher Lemmer Webber's avatar
Christopher Lemmer Webber committed
43
hopefully achieves that goal.)
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81

* How to install Golem

First you'll need [[http://racket-lang.org/][Racket]].
You'll have the option to install the minimal or full distribution of
Racket.
Choose the full installation.

First do a git checkout of this git repository.
Then do: 

: raco pkg install

Okay you're ready to go!

* Running Golem

We're going to need two separate Golem servers running to test
federating with each other.
To do this, open two separate terminals and navigate both of them
to the Golem checkout directory.
Now let's start up each server.

In the first terminal:

: racket golem.rkt --port 8000 --other-stores "http://localhost:8001/read-only-cas" Alice

In the second terminal:

: racket golem.rkt --port 8001 --other-stores "http://localhost:8000/read-only-cas" Bob

In the first terminal, you should see a message like:

: Your Web application is running at http://localhost:8000.
: Stop this program at any time to terminate the Web Server.

Same in the second, but with port 8001.

Christopher Lemmer Webber's avatar
Christopher Lemmer Webber committed
82
Test it out by opening your browser and opening [[http://localhost:8000/]]
83 84
in your browser.
In the upper left hand side you should see "Alice's site".
Christopher Lemmer Webber's avatar
Christopher Lemmer Webber committed
85
Opening [[http://localhost:8001/]] should say the same, but with
86 87
"Bob's site".

88 89 90 91 92 93 94 95 96 97 98 99 100
What's with the =--other-stores= option?
If you'll notice, the two sites are pointing at each other's
read-only-cas endpoints.
This will be how they are able to find each others' content...
more on that later.

* Giving it a try

As said, this is a very very verrrrry paired down ActivityPub
implementation.
Each server that's being run is single-user, and we haven't even
bothered requiring that you authenticate to be able to post content!

Christopher Lemmer Webber's avatar
Christopher Lemmer Webber committed
101
Returning to visiting [[http://localhost:8000/]] or [[http://localhost:8001/]]
102 103 104 105 106 107 108 109 110 111 112 113
in our browser.
What you should see is a form from which we can submit content
and a summary of the posts we most recently sent (our "outbox")
as well as the most recent posts we've received (our "inbox").

Let's try making a post from the form on [[http://localhost:8000/]].
Currently, we should see "Hey look... nothing!" in both the outbox
and inbox sections of the page, because we've neither sent or
received any content.

The *To:* field is who we want to send it to... well, this is
Alice's site, and Alice wants to talk to Bob, so let's put
114
http://localhost:8001/ in this field.[fn:wait-wheres-webfinger]
115 116 117 118 119 120 121 122 123 124 125 126
The box underneath it is the body of our post, so let's put
in a simple message, like "Hello, Bob!".
Now press the "Submit" button.

Okay!
If everything went well underneath "Most recent post in your outbox"
the post "Hello, Bob!" (or whatever message it is that you sent).
But did Bob get it?
Navigate over to [[http://localhost:8001/]] and refresh the page in your
browser.
Yup, the post should be there in the inbox... looks like Bob got it!

127 128 129 130 131 132 133 134 135 136 137 138
[fn:wait-wheres-webfinger] Some users of the conventional fediverse
may be thinking, "Wait a minute!  I thought addressing in ActivityPub
used email-like addresses like [email protected] ... what's going
on?"
That style of addressing is called a [[https://en.wikipedia.org/wiki/WebFinger][Webfinger]] based address, and
while it's possible to use in conjunction with ActivityPub, actual
ActivityPub addressing uses the [[https://en.wikipedia.org/wiki/Uniform_Resource_Identifier][URIs]] of [[https://www.w3.org/TR/activitypub/#actors][actors]].
In this case, [[http://localhost:8001/]] /is/ Bob's actor URI.
Our server does an HTTP request asking for the activitystreams
representation of Bob at that address and gets back a JSON object
that points at Bob's inbox.
We can then use that to federate a message to Bob.
139

140 141
* What's going on?  

142 143
** URLs aren't the only URIs

144 145 146 147
That's all great, but how is this different than any other ActivityPub
instance?
How do we even know that things were sent securely?

148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184
If we look at the *Id:* field, we'll see something like the following:

: magnet:?xt=urn:sha256:Cvy4hoVEsY7n3T2wf4306IhhBS1CV03pNuLtMOR73xc
:        &ek=_T8EGDBegDdmMdqRG4Lyd8zFto0cmck4FoaRzsXcM08
:        &es=aes-ctr

This is the "id" of the object, which is the address from which we can
retrieve the object.
In contemporary ActivityPub servers, this is generally an http(s) URI
scheme.
The message that is delivered to an actor's inbox usually has this
"id" attached to it, so we know where it lives (and can verify its
contents).

But there is no requirement in ActivityPub that the id of an object
be an http(s) URI, only that it be a URI.
In http(s), content is generally "live"; when you request the object,
some specific server is responsible for handing it to you and is
the authority of what belongs there (which could always change).
However if that server goes down (or perhaps if the domain pointing to
it expires or gets transferred) you might not be able to retriev it
any more.
In other words, the http(s) scheme represents a kind of [[https://en.wikipedia.org/wiki/URL][URL]].

URLs have some advantages, but as it turns out, there are other kinds
of URI schemes out there.
One of these is called a [[https://en.wikipedia.org/wiki/Uniform_Resource_Name][URN]], which is fairly well described by
its Wikipedia page:

#+BEGIN_QUOTE
  URNs were conceived as persistent, location-independent identifiers
  assigned within defined namespaces, typically by an authority
  responsible for the namespace, so that they are globally unique and
  persistent over long periods of time, even after the resource which
  they identify ceases to exist or becomes unavailable.
#+END_QUOTE

185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205
I really struggled with these different names when I heard them, so
here's my short handy guide:

 - *URIs:* The broadest category of universal identifiers, of which
   URLs and URNs are both subcategories.  Different schemes (eg
   =https:=, =urn:=, =ftp:=, =file:=, =ipfs:=) signal how we might
   retrieve and interact with the resources at those addresses.
 - *URLs:* Signify some sort of "living" resource that we can imagine
   "living" somewhere, and that location is responsible for their
   content.  =https:=, =file:=, and =ftp:= are all examples of URLs.
   A subcategory of URI.
 - *URNs:* Not tied to a specific location.  Hashes of content
   like =urn:sha1:= are good examples of this, and (usually) so are
   most =ipfs:= URIs.
   A subcategory of URIs (in contrast to URLs).

Don't worry too much about memorizing these names... the general idea
of some URIs being "living in a specific place" and other objects
being "persistent and able to live in many places" is the key here.

An example may help.
206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261
Alice could host a picture of a cat live at =https://catpics.example/pics/mycat.jpg=,
but that could always go down.
If the cat picture became very popular, Alice would be responsible
for paying for all that bandwidth herself.
But there's a category of URNs that are "content addressed"; in other
words, if the sha1 hash of the cat photo Alice wanted to share was
=dbe5b3e2aabde97aefdc5b605cacd0ce8210c203=, Alice could share the URN of
=urn:sha1:dbe5b3e2aabde97aefdc5b605cacd0ce8210c203= with Bob,
and Bob could ask his peers (which could include Alice) for a file that
matches that hash.
Once a Bob finishes downloading a file from that peer, Bob can verify
that the hash of the content matches.
This is a totally valid type of URI, even though it's not what many
users of the web are used to.
And it turns out, we can use these as the identifiers for objects in
ActivityPub, and then they can live anywhere.

But wait... that's not enough.
We want the network in general to be able to help distribute objects
to anyone who asks for them, and yet we also want to keep posts
private between their intended recipients.
We can encrypt the file with a symmetric key we share /only/ with
the intended recipients, break it apart into regularly sized chunks
so nobody can guess which file it is based on its filesize, and then
those encrypted chunks can be safely shared by the whole network...
but only the recipients of the key can unlock its content.
The [[https://github.com/WebOfTrustInfo/rwot7/blob/master/topics-and-advance-readings/magenc.md][Magenc]] writeup explains how it does this by extending the
[[https://en.wikipedia.org/wiki/Magnet_uri][magnet URI]] scheme, composing together both the content URI (or a
manifest chunk that points to the rest of the chunks) with the key
into a new magnet: uri.
(This idea isn't new; it's been done for quite a while by projects
like Tahoe-LAFS and Freenet.)

If we look again at the *Id:* header, now this starts to make a lot
more sense:

: magnet:?xt=urn:sha256:Cvy4hoVEsY7n3T2wf4306IhhBS1CV03pNuLtMOR73xc
:        &ek=_T8EGDBegDdmMdqRG4Lyd8zFto0cmck4FoaRzsXcM08
:        &es=aes-ctr

 - *xt* stands for "eXactTopic".  It's where our initial encrypted
   chunk is!  (Which might be the only chunk if it's very small.)
   Anyone in the peer to peer network can pass this around and help
   share it, but not everyone in the peer to peer network knows
   what it is (this is also helpful for those who want to generally
   distribute content on the network... it can reduce your liability
   for passing around content you don't know about, since you don't
   know what it is).
 - *ek* stands for the "EncryptionKey".  Since it's symmetrically
   encrypted, it's also the decryption key!  We can use this to
   decrypt the chunk above (as well as any other chunks it points
   to).
 - *es* is the encryption suite.  Different encryption suites are
   possible so we need to know which one.  In this case, it's
   aes-ctr.

262 263 264
Again, see the [[https://github.com/WebOfTrustInfo/rwot7/blob/master/topics-and-advance-readings/magenc.md][Magenc writeup]] for more detailed information on how
all this works.

265 266
** Federating and retrieving content

267 268
This is all very good and well, but what does it look like during
federation?
269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336
In other words, what did Alice actually send to Bob's inbox?
If you've read the [[https://www.w3.org/TR/activitypub/#Overview][ActivityPub overview section]] you'll recall
that federation works by looking up the inbox of a recipient
and doing an HTTP POST to that location.
The contents of that POST was the following:

#+BEGIN_SRC javascript
{"@id": "magnet:?xt=urn:sha256:Cvy4hoVEsY7n3T2wf4306IhhBS1CV03pNuLtMOR73xc&ek=_T8EGDBegDdmMdqRG4Lyd8zFto0cmck4FoaRzsXcM08&es=aes-ctr"}
#+END_SRC

"... That's it?" I hear you saying.
"Where's the rest of the message?"

Well, the [email protected]= is the real official location of the message.
We need to fetch the object to verify that it matches that address
and to put it in our store, so why do so twice?

So anyway, our server must retrieve the object matching the hash
of the =xt= query parameter.
Where does it get it from?
Well, do you remember setting up the =--other-stores= keyword argument
in back in the [[*Running Golem][Running Golem]] section?
There are all sorts of ways to configure searching for chunks of
content, but in this demo we're doing the simplest possible thing and
just asking a fixed number of "content addressed stores" if they
have the chunk.

In fact, we just set up Alice and Bob's servers to look at each other!
Each server is running a read-only content-addressed-store at the
=/read-only-cas= path.
For instance, if I want to ask Alice's server if it has the
=urn:sha256:Cvy4hoVEsY7n3T2wf4306IhhBS1CV03pNuLtMOR73xc= chunk,
I can just query it like:

[[http://localhost:8000/read-only-cas?xt=urn:sha256:Cvy4hoVEsY7n3T2wf4306IhhBS1CV03pNuLtMOR73xc]]

Replace the =xt= parameter value with the URN on your own server and you
should be able to save it.
(However, this file will look like binary garbage until you use the =ek=
key to decrypt it.)

So this is exactly what happens in our demo... Alice writes a note to
Bob, and Alice's server encrypts it, splits it into chunks, and sends
an object with the magenc link to Bob's server.
Bob's server (or, this could also be done in a client) can then search
for those chunks (currently, by searching Alice's store, which indeed
Alice is keeping those chunks around since she made them and wants
Bob to find them) and can use the key provided in the magenc link
to decrypt the content.

As it turns out, the Racket magenc demo ships with a nice command line
tool you can use to fetch the contents yourself.
Let's try pulling down the content from Alice's server using the
magenc link that shows up in our client.
You can try it like so (be sure to remove the for-reading-convenience
whitespace from the magnet URI provided in the web interspace):

: raco magenc \
:      --get "magnet:?xt=urn:sha256:tahXSX6UJgbT4lygwlyAYEDhM4pq2s0PwC0Ofl_edY0&ek=KHHtWzsx0isYYVJkNCkLXgdv0FIIY2DXKc0hxoX_-9w&es=aes-ctr" \
:      http://localhost:8000/read-only-cas

(The last arguent is the web address of the content-addressed-store
we want to read from.)

On my machine, the value I get back is (with a bit of pretty printing
applied):

#+BEGIN_SRC javascript
337 338
  {"type": "Create",
   "actor": "http://localhost:8000/",
339
   "to": ["http://localhost:8001/"],
340
   "object": "magnet:?xt=urn:sha256:dO0lxH3zV7S9-sP0f0hzWr0QAopkjB2NSG7pYTmt5bY&ek=msDNJDcFKuFRmIeHolBksU0iQILCoAvAACTHSCr5Iaw&es=aes-ctr"}
341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406
#+END_SRC

Oh that's interesting... so this is the =Create= activity alright,
made by our actor.
But the object is itself is... yet another magenc link.
So let's fetch that too:

: raco magenc \
:      --get "magnet:?xt=urn:sha256:dO0lxH3zV7S9-sP0f0hzWr0QAopkjB2NSG7pYTmt5bY&ek=msDNJDcFKuFRmIeHolBksU0iQILCoAvAACTHSCr5Iaw&es=aes-ctr" \
:      http://localhost:8000/read-only-cas

The value I get back (again, pretty printed for readability) is:

#+BEGIN_SRC javascript
  {"type": "Note",
   "attributedTo": "http://localhost:8000/"
   "content": "Hello, Bob!"}
#+END_SRC

Yep, this seems right!

Since Bob also gets and stores the content, we can also test retrieving
the content from his server and that should work too.

In a production system, servers might indeed expose such an endpoint
for retrieving content that originates on their servers, but in
general it would be good to have a more global store of chunks
available, such as a distributed hash table.
In fact, the popular IPFS system could be used very easily today.
Anywhere that the =urn:sha256:= URIs appeared in this demo,
an =ipfs:= address could appear instead, if we wired things up
to understand IPFS.
Several other fields in the magnet URI scheme are already defined
and in production today to point at sources that content may be
found.

* Some words on liveness and immutability

The astute observer will note that liveness has not entirely
disappeared from this demo.
The publish-subscribe mechanism of ActivityPub requires being able
to POST to an actor's inbox.
So we will indeed need something like http(s) for that purpose,
but everything else (including the actor profiles themselves)
can be stored in a persistent matter as described in this document.
Privacy can also be maintained and the system could be made more "peer
to peer" for the liveness end by using something like Tor .onion
addresses or I2P .i2p addresses.
However this still requires that a server be online and available.

The astute observer will also observe that content cannot be changed
in the above system.
This is true, though there are some ways around this; freenet, IPFS,
and tahoe-lafs all have ways to reference "updateable" content, and
composing with or adopting the ideas of such systems could be done.
(For the moment, this is left as an exercise to the reader.)

Additionally, it is possible to have future Update documents "point
back" to the original document; however, doing this securely would
require introducing a certificate style capability system which could
be as simple as having a catch-all grant to certain keys to be able to
sign off on updates, or something more complicated such as [[https://w3c-ccg.github.io/ocap-ld/][ocap-ld]].
(ocap-ld is only one of several capability approaches that will
be explored as possibilities in future documents.)

* Caveats
407 408 409

 - At the time of writing, the magenc extensions to the magnet scheme
   aren't used by anything in production yet.
410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429
   Maybe things could or should change a little before that happens.
   If you're interested in using Magenc seriously, [[https://dustycloud.org/contact/][contact the author]].
 - =urn:sha256= isn't actually specified yet either, but it probably
   should be.
   For whatever reason (okay, the reason is that there was a base64
   encoding utility built into Racket) the author used base64 encoding
   of the hash in the URN but it could be that if =urn:sha256= went
   mainstream that it would use base32 encoding instead.
 - The [[https://gitlab.com/spritely/golem/blob/master/golem.rkt][code for this demo]] has intentionally made many assumptions
   and oversimplifications that a production system would not make.
   For one thing, only =Create= and =Note= are supported for activities
   and objects, and no attempt is made to check that a post is made
   by the author that claims to make it.
   Other extensions to ActivityPub which are currently conventional,
   such as the use of [[https://www.ietf.org/archive/id/draft-cavage-http-signatures-10.txt][HTTP Signatures]] or [[https://en.wikipedia.org/wiki/WebFinger][WebFinger]] have been
   intentionally left out, so don't expect compatibility with
   "modern day" ActivityPub servers.

* Conclusions

430 431 432
This document and its [[file:golem.rkt][corresponding demo code]] show that it is possible
to share content in a way that is secure and where content is not tied
to any location.
Christopher Lemmer Webber's avatar
Christopher Lemmer Webber committed
433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448
If any server goes down, the other servers which care about the
content can nonetheless keep the content alive.
An entire network of peer to peer participants could be used to share
content, and a post that became popular would not be so burdensome for
a single participant; we do not need to punish content authors for
creating worthwhile content by making them pay exorbinant hosting
costs.
Meanwhile, even if the a global network helps share content, we can
still restrict who can actually reveal the contents of those messages
to intended recipients (and those which the intended recipients choose
to share with as well).

I hope you have enjoyed this demo.
If you have, consider joining the [[https://webchat.freenode.net/?channels=spritely][#spritely channel on irc.freenode.net]]
and let me know what you think, or [[https://dustycloud.org/contact/][contact me directly]].
More demos to come.