README.md 13.4 KB
Newer Older
Adam P. Goucher's avatar
Adam P. Goucher committed
1 2

Deep Learning with the Analytical Engine
Adam P. Goucher's avatar
Adam P. Goucher committed
3
========================================
Adam P. Goucher's avatar
Adam P. Goucher committed
4 5 6 7 8 9

This repository contains an implementation of a convolutional neural network
as a program for Charles Babbage's Analytical Engine, capable of recognising
handwritten digits to a high degree of accuracy (98.5% if provided with a
sufficient amount of training data and left running sufficiently long).

10 11 12 13 14 15 16 17 18 19 20 21
The abstract for the talk where I presented this work is here:

 - http://talks.cam.ac.uk/talk/index/62886

A video of the talk is available here, and the slides are in a PDF in the
repository.

 - https://www.youtube.com/watch?v=i9GMuk0gffM

This should be sufficient to give you a broad overview of the project, and
more detailed resources can be found in the 'Recommended reading' section.

Adam P. Goucher's avatar
Adam P. Goucher committed
22 23 24 25

Installation
------------

Adam P. Goucher's avatar
Adam P. Goucher committed
26 27 28 29 30 31 32 33 34 35 36
The implementation is contained within the directory `babbage`. Since there
is already a compiled cardfile in the subdirectory `cardfiles`, you can run
the convolutional network directly on the Emulator like so:

    cd babbage
    java -cp classes aes cardfiles/suite.cf

This should run painlessly (although it may take several seconds to load the
cardfile). If you need help with the Emulator, see John Walker's manual page
at:

37
 - https://www.fourmilab.ch/babbage/cmdline.html
Adam P. Goucher's avatar
Adam P. Goucher committed
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Alternatively, you may prefer to generate the cardfile yourself by running
the Bash script:

    cd babbage
    bash suite.sh

The subdirectories in `babbage` are listed below:

 - `classes` : the Java Analytical Engine Emulator (by John Walker)
 - `cardfiles` : cardfiles capable of running on the Emulator
 - `scripts` : Python scripts used for construction and compilation
 - `libs` : files used by the scripts for common functions
 - `temp` : temporary files generated during compilation

Also, there are a couple of bash scripts:

 - `aes` : a wrapper for the Emulator
 - `suite.sh` : generates, compiles and executes the convolutional network

There is also a technical document, `memory-map.txt`, which explains the
purpose of each of the Engine's 1000 registers (as used by the deep learning
implementation).

The root directory of the repository also contains some other things:

 - `cpp` : a fast C++ implementation of the same neural network
 - `data` : the raw handwritten digit data used by both implementations
Adam P. Goucher's avatar
Adam P. Goucher committed
66
 - `talk/dlae-talk.pdf` : the slides of a talk I gave on the subject
Adam P. Goucher's avatar
Adam P. Goucher committed
67 68 69 70
 - `README.md` : this file

Have fun.

Adam P. Goucher's avatar
Adam P. Goucher committed
71 72 73 74 75 76 77 78 79 80 81 82 83 84

Details about the deep-learning algorithm
-----------------------------------------

(You may want to read _Neural Networks and Deep Learning_ if you're unsure
of the terminology in this section. A link is available in the 'Recommended
reading' section of this document.)

We implement a deep convolutional neural network, trained by stochastic
gradient descent and backpropagation. The activation functions are piecewise-
rational approximations to `tanh`; the cost function is the correct analogue
of cross-entropy.

There are two convolutional layers, each of which is followed by a max-pooling
Adam P. Goucher's avatar
Adam P. Goucher committed
85 86
layer, and one fully-connected output layer. These are depicted pictorially
in the diagram of the architecture (`talk/convnet-complete.png`).
Adam P. Goucher's avatar
Adam P. Goucher committed
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129

The first convolutional layer takes a 25-by-25 input image (each pixel being
a greyscale value represented as an integer between 0 and 99). This is
convolved with 6-by-6 kernels to produce 5 feature maps of size 20-by-20,
each of which undergo max-pooling to produce 10-by-10 feature maps, which
are normalised to [-1, 1] by applying the activation function.

The second convolutional layer takes these 5 feature-maps and convolves them
with 3-by-3 kernels to form 10 feature-maps of size 8-by-8. Again, these
undergo max-poooling to reduce the resolution to 4-by-4, and they are again
normalised to the interval [-1, 1] by the same activation function.

The resulting 10 feature maps, each of size 4-by-4, together comprise 160
pixel values. These are passed into a fully-connected layer with 160 inputs
and 10 outputs, each of which is normalised by the activation function to give
a value in the interval [-1, 1]. The weights in the fully-connected layer are
regularised by L2 regularisation (weight decay); specifically, each weight is
multiplied by 0.999999 after updating.

The output of the network is a vector of 10 values, each between -1 and +1.
This is interpreted as a digit (in the range {0, 1, ..., 9}) depending on
which of the 10 values is the greatest. The 'optimal' output corresponding
to a particular digit k is the output where the kth value is +1 and the other
9 values are all -1; the cost function is determined against this idealised
target output.


Implementation details
----------------------

The Analytical Engine has storage for 1000 variables, each of which is a
signed 50-digit decimal integer. Due to the small amount of memory (approx.
20 kilobytes) and minimalistic instruction set (elementary arithmetic on
integers, with no support for floating-point arithmetic), there are several
architecture-specific caveats which are absent on modern CPUs. (That said,
these same issues may be present on a GPU / FPGA / ASIC with low memory, so
this project isn't _entirely_ an irrelevant historical digression!)

The Analytical Engine only supports integer arithmetic, so we perform fixed-
point arithmetic with 20 decimal places. A fixed-point multiplication (resp.
division) thus requires a logical right- (resp. left-) shift after (resp.
before) the equivalent integer operation. This caveat is described here:

130
 - http://fourmilab.ch/babbage/authentic.html#fixed
Adam P. Goucher's avatar
Adam P. Goucher committed
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264

To fit within the storage limitations of the Analytical Engine, arrays are
'packed' by storing five real numbers in each register. Since an Analytical
Engine register can store a 50-digit (signed) integer, this allows 10 digits
(each unsigned) for each real number:

    aaaaaaaaaa bbbbbbbbbb cccccccccc dddddddddd eeeeeeeeee

We store a real number x as int((50 + x) * (10 ^ 8)), allowing us to store
numbers with 8 decimal places of precision. The admissible range for packed
real numbers is thus [-50.00000000, 49.99999999], which is more than adequate
for our purposes. Additional shifts are thus required for converting between
packed (8 d.p.) and unpacked (20 d.p.) representations.

For our activation function, we use:

    sigma(x) = (x + 2x^3) / (1 + abs(x + 2x^3))

which is a reasonably accurate piecewise-rational approximation to tanh(x).
The need for it being piecewise-rational is that only addition, subtraction,
multiplication and division are allowed operations on the Analytical Engine.
Using a more accurate approximation to tanh(x) does not give any perceptible
improvement, and would be slower to compute.


Code generation
---------------

The Analytical Engine only supports fixed relative jumps, each of which can
be either forward or backward, conditional or unconditional. This makes any
long machine-readable program impossibly unmaintainable and incomprehensible
for humans. Consequently, we place various levels of abstraction between the
abstract neural network and the direct implementation.

The lowest intermediate level of abstraction is support for jumping to named
labels. The scripts `goto2cc.py` replaces jumps such as:

    jmp mainloop

and

    cjmp error_handler

to labels declared (betwixt a period and a colon) in the following manner:

    .error_handler:

with an equivalent combinatorial card (http://fourmilab.ch/babbage/cards.html)
such as CB?73. This removes the need to track line numbers manually.

Additionally, `jsr2goto.py` implements a call-stack (in registers R014 through
R029) to facilitate nested subroutine calls and returns. Since combinatorial
cards have fixed destinations, the script generates huge binary trees of
branches to implement the `ret` (return) instruction.

Despite its name, `jsr2goto.py` also translates other instructions, such as
memcpy, and expands **single** arithmetic instructions such as:

    R001 = R002 * R003

to the equivalent sequence of operation and variable cards:

    *
    L002
    L003
    S001

Moreover, either one of the input operands is allowed to be a literal:

    R007 = R007 + 15

in which case the constant 15 is loaded onto the register R001 by means of a
number card. Consequently, any use of this instruction will destroy the value
register R001; in general, it is safest to avoid using that register.

Certain subroutines are useful as callable entities in multiple programs; we
deem collections of subroutines 'libraries' and store them in the `libs`
subdirectory. Any libraries should be concatenated to the end of your code
**before** running `jsr2goto.py` and `goto2cc.py` (in that order). We include
several libraries, namely:

 - `libs/matrix.xcf` : rudimentary linear algebra functions
 - `libs/packing.xcf` : routines for packing and unpacking real numbers
 - `libs/sigmoid.xcf` : implementation of the activation function
 - `libs/random.xcf` : generates uniform and Gaussian random variables via LCG

Additionally, the layers of the convolutional neural network are generated by
the script `lcon.py` (which was actually the most time-consuming part of the
project, requiring many hours to write). Data is prepared for the Analytical
Engine by means of another script, `dcon.py`.

The highest level of abstraction is provided by `suite.sh`, which invokes all
of the above and wraps it into a complete program (or 'cardfile') which, when
run on the Analytical Engine, will produce a human-readable progress report
detailing its operation (cf. `example.out` for a sample output).


Recommended reading
-------------------

These three resources are **definitely** worth perusing:

 - Padua, S. _The Thrilling Adventures of Lovelace and Babbage_ (http://sydneypadua.com/)
 - Nielsen, M. A. _Neural Networks and Deep Learning_ (http://neuralnetworksanddeeplearning.com/)
 - Walker, J. _The Analytical Engine_ (http://fourmilab.ch/babbage/)

The first of these is a thoroughly enjoyable comic book exploring an alternate
reality in which the Analytical Engine was actually constructed. Additionally,
there are plenty of informative footnotes giving juicy details about Lovelace,
Babbage, and the complex interplay between our Victorian protagonists. I would
certainly recommend acquiring a copy if you're interested in the history of
the Analytical Engine (which you invariably should be!).

Second is a free online book by Michael Nielsen, which is almost certainly the
best hands-on introduction to the subject of neural networks and deep learning.
It gives a detailed and accessible introduction to how neural networks are
structured, the details of stochastic gradient descent and backpropagation,
and a brief introduction to convolutional neural networks. The book comes with
examples written in Python, which are rather more readable and easy to modify
than the Analytical Engine code featured in this repository.

The third resource is the website of John Walker, who created `aes`, the Java
Analytical Engine emulator used in this project. It has a vast menagerie of
examples and instructions on how to prepare programs for the Analytical Engine.
There are also historical documents, including Ada Lovelace's fabled Notes on
the Analytical Engine.


Further resources (on Babbage and Lovelace)
-------------------------------------------

Sydney Padua (who wrote the comic book mentioned in the 'Recommended reading'
section) is also giving a talk in Cambridge on 2016-02-09:

265
 - http://talks.cam.ac.uk/talk/index/64132
Adam P. Goucher's avatar
Adam P. Goucher committed
266 267 268

There is currently a plan to finally build the Analytical Engine:

269
 - http://plan28.org/
Adam P. Goucher's avatar
Adam P. Goucher committed
270 271 272 273

The homepage for the Cambridge Lovelace Hackathon is here (watch this space
for updates about possible subsequent events):

274
 - http://www.cambridgelovelace.org/
Adam P. Goucher's avatar
Adam P. Goucher committed
275 276 277 278 279 280 281 282 283 284 285 286 287 288

We're keen to broaden our geographical scope, so please get in touch if you're
from elsewhere! (We're seriously considering Palo Alto as a possible location,
given that a few people there were interested in our 2015 Cambridge Lovelace
Hackathon.) Contact details are available from the above website.


Further resources (on deep learning)
------------------------------------

Another great introduction to convolutional neural networks (presented as a
three-part series of articles with plenty of pretty diagrams!) was written by
Chris Olah and is available on his website:

289
 - http://colah.github.io/
Adam P. Goucher's avatar
Adam P. Goucher committed
290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321

In particular, it describes some more adventurous neural network architectures
such as recurrent neural networks (where data can travel in loops rather than
propagating through the network in a strict linear progression from input to
output). Amusingly, this is very similar to the disparity between the
Difference Engine and Babbage's earliest proto-design of the Analytical Engine
(which was initially conceived as a Difference Engine looped back on itself,
cf. ouroboros).


Acknowledgements
----------------

Special thanks go to the authors of the above resources (Sydney Padua, John
Walker and Michael Nielsen), which were immeasurably useful for this project
and the accompanying talk. Also, I would like to thank my fellow organisers
and participants of the Cambridge Lovelace Hackathon (Tim Hutton, Simon Rawles,
Jade-Amanda Laporte, Jardin-Alice Laporte, Tom Webster, Julian Brown et al.)
where this idea was initially conceived. Finally, my gratitude goes to Matthew
Ireland and Jesse Wang for hosting my talk at the Churchill Computer Science
Talks series.


Licence
-------

This work is licenced under the Creative Commons Attribution 4.0 licence
**CC BY 4.0** (https://creativecommons.org/licenses/by/4.0/). Informally, this
means you can modify and use this work for any purpose, even commercially,
as long as the author is credited.

Adam P. Goucher, 2016-02-01.