Deep Learning with the Analytical Engine
This repository contains an implementation of a convolutional neural network as a program for Charles Babbage's Analytical Engine, capable of recognising handwritten digits to a high degree of accuracy (98.5% if provided with a sufficient amount of training data and left running sufficiently long).
The abstract for the talk where I presented this work is here:
A video of the talk is available here, and the slides are in a PDF in the repository.
This should be sufficient to give you a broad overview of the project, and more detailed resources can be found in the 'Recommended reading' section.
Installation
The implementation is contained within the directory babbage
. Since there
is already a compiled cardfile in the subdirectory cardfiles
, you can run
the convolutional network directly on the Emulator like so:
cd babbage
java cp classes aes cardfiles/suite.cf
This should run painlessly (although it may take several seconds to load the cardfile). If you need help with the Emulator, see John Walker's manual page at:
Alternatively, you may prefer to generate the cardfile yourself by running the Bash script:
cd babbage
bash suite.sh
The subdirectories in babbage
are listed below:

classes
: the Java Analytical Engine Emulator (by John Walker) 
cardfiles
: cardfiles capable of running on the Emulator 
scripts
: Python scripts used for construction and compilation 
libs
: files used by the scripts for common functions 
temp
: temporary files generated during compilation
Also, there are a couple of bash scripts:

aes
: a wrapper for the Emulator 
suite.sh
: generates, compiles and executes the convolutional network
There is also a technical document, memorymap.txt
, which explains the
purpose of each of the Engine's 1000 registers (as used by the deep learning
implementation).
The root directory of the repository also contains some other things:

cpp
: a fast C++ implementation of the same neural network 
data
: the raw handwritten digit data used by both implementations 
talk/dlaetalk.pdf
: the slides of a talk I gave on the subject 
README.md
: this file
Have fun.
Details about the deeplearning algorithm
(You may want to read Neural Networks and Deep Learning if you're unsure of the terminology in this section. A link is available in the 'Recommended reading' section of this document.)
We implement a deep convolutional neural network, trained by stochastic
gradient descent and backpropagation. The activation functions are piecewise
rational approximations to tanh
; the cost function is the correct analogue
of crossentropy.
There are two convolutional layers, each of which is followed by a maxpooling
layer, and one fullyconnected output layer. These are depicted pictorially
in the diagram of the architecture (talk/convnetcomplete.png
).
The first convolutional layer takes a 25by25 input image (each pixel being a greyscale value represented as an integer between 0 and 99). This is convolved with 6by6 kernels to produce 5 feature maps of size 20by20, each of which undergo maxpooling to produce 10by10 feature maps, which are normalised to [1, 1] by applying the activation function.
The second convolutional layer takes these 5 featuremaps and convolves them with 3by3 kernels to form 10 featuremaps of size 8by8. Again, these undergo maxpoooling to reduce the resolution to 4by4, and they are again normalised to the interval [1, 1] by the same activation function.
The resulting 10 feature maps, each of size 4by4, together comprise 160 pixel values. These are passed into a fullyconnected layer with 160 inputs and 10 outputs, each of which is normalised by the activation function to give a value in the interval [1, 1]. The weights in the fullyconnected layer are regularised by L2 regularisation (weight decay); specifically, each weight is multiplied by 0.999999 after updating.
The output of the network is a vector of 10 values, each between 1 and +1. This is interpreted as a digit (in the range {0, 1, ..., 9}) depending on which of the 10 values is the greatest. The 'optimal' output corresponding to a particular digit k is the output where the kth value is +1 and the other 9 values are all 1; the cost function is determined against this idealised target output.
Implementation details
The Analytical Engine has storage for 1000 variables, each of which is a signed 50digit decimal integer. Due to the small amount of memory (approx. 20 kilobytes) and minimalistic instruction set (elementary arithmetic on integers, with no support for floatingpoint arithmetic), there are several architecturespecific caveats which are absent on modern CPUs. (That said, these same issues may be present on a GPU / FPGA / ASIC with low memory, so this project isn't entirely an irrelevant historical digression!)
The Analytical Engine only supports integer arithmetic, so we perform fixed point arithmetic with 20 decimal places. A fixedpoint multiplication (resp. division) thus requires a logical right (resp. left) shift after (resp. before) the equivalent integer operation. This caveat is described here:
To fit within the storage limitations of the Analytical Engine, arrays are 'packed' by storing five real numbers in each register. Since an Analytical Engine register can store a 50digit (signed) integer, this allows 10 digits (each unsigned) for each real number:
aaaaaaaaaa bbbbbbbbbb cccccccccc dddddddddd eeeeeeeeee
We store a real number x as int((50 + x) * (10 ^ 8)), allowing us to store numbers with 8 decimal places of precision. The admissible range for packed real numbers is thus [50.00000000, 49.99999999], which is more than adequate for our purposes. Additional shifts are thus required for converting between packed (8 d.p.) and unpacked (20 d.p.) representations.
For our activation function, we use:
sigma(x) = (x + 2x^3) / (1 + abs(x + 2x^3))
which is a reasonably accurate piecewiserational approximation to tanh(x). The need for it being piecewiserational is that only addition, subtraction, multiplication and division are allowed operations on the Analytical Engine. Using a more accurate approximation to tanh(x) does not give any perceptible improvement, and would be slower to compute.
Code generation
The Analytical Engine only supports fixed relative jumps, each of which can be either forward or backward, conditional or unconditional. This makes any long machinereadable program impossibly unmaintainable and incomprehensible for humans. Consequently, we place various levels of abstraction between the abstract neural network and the direct implementation.
The lowest intermediate level of abstraction is support for jumping to named
labels. The scripts goto2cc.py
replaces jumps such as:
jmp mainloop
and
cjmp error_handler
to labels declared (betwixt a period and a colon) in the following manner:
.error_handler:
with an equivalent combinatorial card (http://fourmilab.ch/babbage/cards.html) such as CB?73. This removes the need to track line numbers manually.
Additionally, jsr2goto.py
implements a callstack (in registers R014 through
R029) to facilitate nested subroutine calls and returns. Since combinatorial
cards have fixed destinations, the script generates huge binary trees of
branches to implement the ret
(return) instruction.
Despite its name, jsr2goto.py
also translates other instructions, such as
memcpy, and expands single arithmetic instructions such as:
R001 = R002 * R003
to the equivalent sequence of operation and variable cards:
*
L002
L003
S001
Moreover, either one of the input operands is allowed to be a literal:
R007 = R007 + 15
in which case the constant 15 is loaded onto the register R001 by means of a number card. Consequently, any use of this instruction will destroy the value register R001; in general, it is safest to avoid using that register.
Certain subroutines are useful as callable entities in multiple programs; we
deem collections of subroutines 'libraries' and store them in the libs
subdirectory. Any libraries should be concatenated to the end of your code
before running jsr2goto.py
and goto2cc.py
(in that order). We include
several libraries, namely:

libs/matrix.xcf
: rudimentary linear algebra functions 
libs/packing.xcf
: routines for packing and unpacking real numbers 
libs/sigmoid.xcf
: implementation of the activation function 
libs/random.xcf
: generates uniform and Gaussian random variables via LCG
Additionally, the layers of the convolutional neural network are generated by
the script lcon.py
(which was actually the most timeconsuming part of the
project, requiring many hours to write). Data is prepared for the Analytical
Engine by means of another script, dcon.py
.
The highest level of abstraction is provided by suite.sh
, which invokes all
of the above and wraps it into a complete program (or 'cardfile') which, when
run on the Analytical Engine, will produce a humanreadable progress report
detailing its operation (cf. example.out
for a sample output).
Recommended reading
These three resources are definitely worth perusing:
 Padua, S. The Thrilling Adventures of Lovelace and Babbage (http://sydneypadua.com/)
 Nielsen, M. A. Neural Networks and Deep Learning (http://neuralnetworksanddeeplearning.com/)
 Walker, J. The Analytical Engine (http://fourmilab.ch/babbage/)
The first of these is a thoroughly enjoyable comic book exploring an alternate reality in which the Analytical Engine was actually constructed. Additionally, there are plenty of informative footnotes giving juicy details about Lovelace, Babbage, and the complex interplay between our Victorian protagonists. I would certainly recommend acquiring a copy if you're interested in the history of the Analytical Engine (which you invariably should be!).
Second is a free online book by Michael Nielsen, which is almost certainly the best handson introduction to the subject of neural networks and deep learning. It gives a detailed and accessible introduction to how neural networks are structured, the details of stochastic gradient descent and backpropagation, and a brief introduction to convolutional neural networks. The book comes with examples written in Python, which are rather more readable and easy to modify than the Analytical Engine code featured in this repository.
The third resource is the website of John Walker, who created aes
, the Java
Analytical Engine emulator used in this project. It has a vast menagerie of
examples and instructions on how to prepare programs for the Analytical Engine.
There are also historical documents, including Ada Lovelace's fabled Notes on
the Analytical Engine.
Further resources (on Babbage and Lovelace)
Sydney Padua (who wrote the comic book mentioned in the 'Recommended reading' section) is also giving a talk in Cambridge on 20160209:
There is currently a plan to finally build the Analytical Engine:
The homepage for the Cambridge Lovelace Hackathon is here (watch this space for updates about possible subsequent events):
We're keen to broaden our geographical scope, so please get in touch if you're from elsewhere! (We're seriously considering Palo Alto as a possible location, given that a few people there were interested in our 2015 Cambridge Lovelace Hackathon.) Contact details are available from the above website.
Further resources (on deep learning)
Another great introduction to convolutional neural networks (presented as a threepart series of articles with plenty of pretty diagrams!) was written by Chris Olah and is available on his website:
In particular, it describes some more adventurous neural network architectures such as recurrent neural networks (where data can travel in loops rather than propagating through the network in a strict linear progression from input to output). Amusingly, this is very similar to the disparity between the Difference Engine and Babbage's earliest protodesign of the Analytical Engine (which was initially conceived as a Difference Engine looped back on itself, cf. ouroboros).
Acknowledgements
Special thanks go to the authors of the above resources (Sydney Padua, John Walker and Michael Nielsen), which were immeasurably useful for this project and the accompanying talk. Also, I would like to thank my fellow organisers and participants of the Cambridge Lovelace Hackathon (Tim Hutton, Simon Rawles, JadeAmanda Laporte, JardinAlice Laporte, Tom Webster, Julian Brown et al.) where this idea was initially conceived. Finally, my gratitude goes to Matthew Ireland and Jesse Wang for hosting my talk at the Churchill Computer Science Talks series.
Licence
This work is licenced under the Creative Commons Attribution 4.0 licence CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). Informally, this means you can modify and use this work for any purpose, even commercially, as long as the author is credited.
Adam P. Goucher, 20160201.