Deep Learning with the Analytical Engine
========================================
This repository contains an implementation of a convolutional neural network
as a program for Charles Babbage's Analytical Engine, capable of recognising
handwritten digits to a high degree of accuracy (98.5% if provided with a
sufficient amount of training data and left running sufficiently long).
The abstract for the talk where I presented this work is here:
- http://talks.cam.ac.uk/talk/index/62886
A video of the talk is available here, and the slides are in a PDF in the
repository.
- https://www.youtube.com/watch?v=i9GMuk0gffM
This should be sufficient to give you a broad overview of the project, and
more detailed resources can be found in the 'Recommended reading' section.
Installation
------------
The implementation is contained within the directory `babbage`. Since there
is already a compiled cardfile in the subdirectory `cardfiles`, you can run
the convolutional network directly on the Emulator like so:
cd babbage
java -cp classes aes cardfiles/suite.cf
This should run painlessly (although it may take several seconds to load the
cardfile). If you need help with the Emulator, see John Walker's manual page
at:
- https://www.fourmilab.ch/babbage/cmdline.html
Alternatively, you may prefer to generate the cardfile yourself by running
the Bash script:
cd babbage
bash suite.sh
The subdirectories in `babbage` are listed below:
- `classes` : the Java Analytical Engine Emulator (by John Walker)
- `cardfiles` : cardfiles capable of running on the Emulator
- `scripts` : Python scripts used for construction and compilation
- `libs` : files used by the scripts for common functions
- `temp` : temporary files generated during compilation
Also, there are a couple of bash scripts:
- `aes` : a wrapper for the Emulator
- `suite.sh` : generates, compiles and executes the convolutional network
There is also a technical document, `memory-map.txt`, which explains the
purpose of each of the Engine's 1000 registers (as used by the deep learning
implementation).
The root directory of the repository also contains some other things:
- `cpp` : a fast C++ implementation of the same neural network
- `data` : the raw handwritten digit data used by both implementations
- `talk/dlae-talk.pdf` : the slides of a talk I gave on the subject
- `README.md` : this file
Have fun.
Details about the deep-learning algorithm
-----------------------------------------
(You may want to read _Neural Networks and Deep Learning_ if you're unsure
of the terminology in this section. A link is available in the 'Recommended
reading' section of this document.)
We implement a deep convolutional neural network, trained by stochastic
gradient descent and backpropagation. The activation functions are piecewise-
rational approximations to `tanh`; the cost function is the correct analogue
of cross-entropy.
There are two convolutional layers, each of which is followed by a max-pooling
layer, and one fully-connected output layer. These are depicted pictorially
in the diagram of the architecture (`talk/convnet-complete.png`).
The first convolutional layer takes a 25-by-25 input image (each pixel being
a greyscale value represented as an integer between 0 and 99). This is
convolved with 6-by-6 kernels to produce 5 feature maps of size 20-by-20,
each of which undergo max-pooling to produce 10-by-10 feature maps, which
are normalised to [-1, 1] by applying the activation function.
The second convolutional layer takes these 5 feature-maps and convolves them
with 3-by-3 kernels to form 10 feature-maps of size 8-by-8. Again, these
undergo max-poooling to reduce the resolution to 4-by-4, and they are again
normalised to the interval [-1, 1] by the same activation function.
The resulting 10 feature maps, each of size 4-by-4, together comprise 160
pixel values. These are passed into a fully-connected layer with 160 inputs
and 10 outputs, each of which is normalised by the activation function to give
a value in the interval [-1, 1]. The weights in the fully-connected layer are
regularised by L2 regularisation (weight decay); specifically, each weight is
multiplied by 0.999999 after updating.
The output of the network is a vector of 10 values, each between -1 and +1.
This is interpreted as a digit (in the range {0, 1, ..., 9}) depending on
which of the 10 values is the greatest. The 'optimal' output corresponding
to a particular digit k is the output where the kth value is +1 and the other
9 values are all -1; the cost function is determined against this idealised
target output.
Implementation details
----------------------
The Analytical Engine has storage for 1000 variables, each of which is a
signed 50-digit decimal integer. Due to the small amount of memory (approx.
20 kilobytes) and minimalistic instruction set (elementary arithmetic on
integers, with no support for floating-point arithmetic), there are several
architecture-specific caveats which are absent on modern CPUs. (That said,
these same issues may be present on a GPU / FPGA / ASIC with low memory, so
this project isn't _entirely_ an irrelevant historical digression!)
The Analytical Engine only supports integer arithmetic, so we perform fixed-
point arithmetic with 20 decimal places. A fixed-point multiplication (resp.
division) thus requires a logical right- (resp. left-) shift after (resp.
before) the equivalent integer operation. This caveat is described here:
- http://fourmilab.ch/babbage/authentic.html#fixed
To fit within the storage limitations of the Analytical Engine, arrays are
'packed' by storing five real numbers in each register. Since an Analytical
Engine register can store a 50-digit (signed) integer, this allows 10 digits
(each unsigned) for each real number:
aaaaaaaaaa bbbbbbbbbb cccccccccc dddddddddd eeeeeeeeee
We store a real number x as int((50 + x) * (10 ^ 8)), allowing us to store
numbers with 8 decimal places of precision. The admissible range for packed
real numbers is thus [-50.00000000, 49.99999999], which is more than adequate
for our purposes. Additional shifts are thus required for converting between
packed (8 d.p.) and unpacked (20 d.p.) representations.
For our activation function, we use:
sigma(x) = (x + 2x^3) / (1 + abs(x + 2x^3))
which is a reasonably accurate piecewise-rational approximation to tanh(x).
The need for it being piecewise-rational is that only addition, subtraction,
multiplication and division are allowed operations on the Analytical Engine.
Using a more accurate approximation to tanh(x) does not give any perceptible
improvement, and would be slower to compute.
Code generation
---------------
The Analytical Engine only supports fixed relative jumps, each of which can
be either forward or backward, conditional or unconditional. This makes any
long machine-readable program impossibly unmaintainable and incomprehensible
for humans. Consequently, we place various levels of abstraction between the
abstract neural network and the direct implementation.
The lowest intermediate level of abstraction is support for jumping to named
labels. The scripts `goto2cc.py` replaces jumps such as:
jmp mainloop
and
cjmp error_handler
to labels declared (betwixt a period and a colon) in the following manner:
.error_handler:
with an equivalent combinatorial card (http://fourmilab.ch/babbage/cards.html)
such as CB?73. This removes the need to track line numbers manually.
Additionally, `jsr2goto.py` implements a call-stack (in registers R014 through
R029) to facilitate nested subroutine calls and returns. Since combinatorial
cards have fixed destinations, the script generates huge binary trees of
branches to implement the `ret` (return) instruction.
Despite its name, `jsr2goto.py` also translates other instructions, such as
memcpy, and expands **single** arithmetic instructions such as:
R001 = R002 * R003
to the equivalent sequence of operation and variable cards:
*
L002
L003
S001
Moreover, either one of the input operands is allowed to be a literal:
R007 = R007 + 15
in which case the constant 15 is loaded onto the register R001 by means of a
number card. Consequently, any use of this instruction will destroy the value
register R001; in general, it is safest to avoid using that register.
Certain subroutines are useful as callable entities in multiple programs; we
deem collections of subroutines 'libraries' and store them in the `libs`
subdirectory. Any libraries should be concatenated to the end of your code
**before** running `jsr2goto.py` and `goto2cc.py` (in that order). We include
several libraries, namely:
- `libs/matrix.xcf` : rudimentary linear algebra functions
- `libs/packing.xcf` : routines for packing and unpacking real numbers
- `libs/sigmoid.xcf` : implementation of the activation function
- `libs/random.xcf` : generates uniform and Gaussian random variables via LCG
Additionally, the layers of the convolutional neural network are generated by
the script `lcon.py` (which was actually the most time-consuming part of the
project, requiring many hours to write). Data is prepared for the Analytical
Engine by means of another script, `dcon.py`.
The highest level of abstraction is provided by `suite.sh`, which invokes all
of the above and wraps it into a complete program (or 'cardfile') which, when
run on the Analytical Engine, will produce a human-readable progress report
detailing its operation (cf. `example.out` for a sample output).
Recommended reading
-------------------
These three resources are **definitely** worth perusing:
- Padua, S. _The Thrilling Adventures of Lovelace and Babbage_ (http://sydneypadua.com/)
- Nielsen, M. A. _Neural Networks and Deep Learning_ (http://neuralnetworksanddeeplearning.com/)
- Walker, J. _The Analytical Engine_ (http://fourmilab.ch/babbage/)
The first of these is a thoroughly enjoyable comic book exploring an alternate
reality in which the Analytical Engine was actually constructed. Additionally,
there are plenty of informative footnotes giving juicy details about Lovelace,
Babbage, and the complex interplay between our Victorian protagonists. I would
certainly recommend acquiring a copy if you're interested in the history of
the Analytical Engine (which you invariably should be!).
Second is a free online book by Michael Nielsen, which is almost certainly the
best hands-on introduction to the subject of neural networks and deep learning.
It gives a detailed and accessible introduction to how neural networks are
structured, the details of stochastic gradient descent and backpropagation,
and a brief introduction to convolutional neural networks. The book comes with
examples written in Python, which are rather more readable and easy to modify
than the Analytical Engine code featured in this repository.
The third resource is the website of John Walker, who created `aes`, the Java
Analytical Engine emulator used in this project. It has a vast menagerie of
examples and instructions on how to prepare programs for the Analytical Engine.
There are also historical documents, including Ada Lovelace's fabled Notes on
the Analytical Engine.
Further resources (on Babbage and Lovelace)
-------------------------------------------
Sydney Padua (who wrote the comic book mentioned in the 'Recommended reading'
section) is also giving a talk in Cambridge on 2016-02-09:
- http://talks.cam.ac.uk/talk/index/64132
There is currently a plan to finally build the Analytical Engine:
- http://plan28.org/
The homepage for the Cambridge Lovelace Hackathon is here (watch this space
for updates about possible subsequent events):
- http://www.cambridgelovelace.org/
We're keen to broaden our geographical scope, so please get in touch if you're
from elsewhere! (We're seriously considering Palo Alto as a possible location,
given that a few people there were interested in our 2015 Cambridge Lovelace
Hackathon.) Contact details are available from the above website.
Further resources (on deep learning)
------------------------------------
Another great introduction to convolutional neural networks (presented as a
three-part series of articles with plenty of pretty diagrams!) was written by
Chris Olah and is available on his website:
- http://colah.github.io/
In particular, it describes some more adventurous neural network architectures
such as recurrent neural networks (where data can travel in loops rather than
propagating through the network in a strict linear progression from input to
output). Amusingly, this is very similar to the disparity between the
Difference Engine and Babbage's earliest proto-design of the Analytical Engine
(which was initially conceived as a Difference Engine looped back on itself,
cf. ouroboros).
Acknowledgements
----------------
Special thanks go to the authors of the above resources (Sydney Padua, John
Walker and Michael Nielsen), which were immeasurably useful for this project
and the accompanying talk. Also, I would like to thank my fellow organisers
and participants of the Cambridge Lovelace Hackathon (Tim Hutton, Simon Rawles,
Jade-Amanda Laporte, Jardin-Alice Laporte, Tom Webster, Julian Brown et al.)
where this idea was initially conceived. Finally, my gratitude goes to Matthew
Ireland and Jesse Wang for hosting my talk at the Churchill Computer Science
Talks series.
Licence
-------
This work is licenced under the Creative Commons Attribution 4.0 licence
**CC BY 4.0** (https://creativecommons.org/licenses/by/4.0/). Informally, this
means you can modify and use this work for any purpose, even commercially,
as long as the author is credited.
Adam P. Goucher, 2016-02-01.