DLAE

DLAE

Deep Learning with the Analytical Engine

Deep Learning with the Analytical Engine

This repository contains an implementation of a convolutional neural network as a program for Charles Babbage's Analytical Engine, capable of recognising handwritten digits to a high degree of accuracy (98.5% if provided with a sufficient amount of training data and left running sufficiently long).

The abstract for the talk where I presented this work is here:

A video of the talk is available here, and the slides are in a PDF in the repository.

This should be sufficient to give you a broad overview of the project, and more detailed resources can be found in the 'Recommended reading' section.

Installation

The implementation is contained within the directory babbage. Since there is already a compiled cardfile in the subdirectory cardfiles, you can run the convolutional network directly on the Emulator like so:

cd babbage
java -cp classes aes cardfiles/suite.cf

This should run painlessly (although it may take several seconds to load the cardfile). If you need help with the Emulator, see John Walker's manual page at:

Alternatively, you may prefer to generate the cardfile yourself by running the Bash script:

cd babbage
bash suite.sh

The subdirectories in babbage are listed below:

  • classes : the Java Analytical Engine Emulator (by John Walker)
  • cardfiles : cardfiles capable of running on the Emulator
  • scripts : Python scripts used for construction and compilation
  • libs : files used by the scripts for common functions
  • temp : temporary files generated during compilation

Also, there are a couple of bash scripts:

  • aes : a wrapper for the Emulator
  • suite.sh : generates, compiles and executes the convolutional network

There is also a technical document, memory-map.txt, which explains the purpose of each of the Engine's 1000 registers (as used by the deep learning implementation).

The root directory of the repository also contains some other things:

  • cpp : a fast C++ implementation of the same neural network
  • data : the raw handwritten digit data used by both implementations
  • talk/dlae-talk.pdf : the slides of a talk I gave on the subject
  • README.md : this file

Have fun.

Details about the deep-learning algorithm

(You may want to read Neural Networks and Deep Learning if you're unsure of the terminology in this section. A link is available in the 'Recommended reading' section of this document.)

We implement a deep convolutional neural network, trained by stochastic gradient descent and backpropagation. The activation functions are piecewise- rational approximations to tanh; the cost function is the correct analogue of cross-entropy.

There are two convolutional layers, each of which is followed by a max-pooling layer, and one fully-connected output layer. These are depicted pictorially in the diagram of the architecture (talk/convnet-complete.png).

The first convolutional layer takes a 25-by-25 input image (each pixel being a greyscale value represented as an integer between 0 and 99). This is convolved with 6-by-6 kernels to produce 5 feature maps of size 20-by-20, each of which undergo max-pooling to produce 10-by-10 feature maps, which are normalised to [-1, 1] by applying the activation function.

The second convolutional layer takes these 5 feature-maps and convolves them with 3-by-3 kernels to form 10 feature-maps of size 8-by-8. Again, these undergo max-poooling to reduce the resolution to 4-by-4, and they are again normalised to the interval [-1, 1] by the same activation function.

The resulting 10 feature maps, each of size 4-by-4, together comprise 160 pixel values. These are passed into a fully-connected layer with 160 inputs and 10 outputs, each of which is normalised by the activation function to give a value in the interval [-1, 1]. The weights in the fully-connected layer are regularised by L2 regularisation (weight decay); specifically, each weight is multiplied by 0.999999 after updating.

The output of the network is a vector of 10 values, each between -1 and +1. This is interpreted as a digit (in the range {0, 1, ..., 9}) depending on which of the 10 values is the greatest. The 'optimal' output corresponding to a particular digit k is the output where the kth value is +1 and the other 9 values are all -1; the cost function is determined against this idealised target output.

Implementation details

The Analytical Engine has storage for 1000 variables, each of which is a signed 50-digit decimal integer. Due to the small amount of memory (approx. 20 kilobytes) and minimalistic instruction set (elementary arithmetic on integers, with no support for floating-point arithmetic), there are several architecture-specific caveats which are absent on modern CPUs. (That said, these same issues may be present on a GPU / FPGA / ASIC with low memory, so this project isn't entirely an irrelevant historical digression!)

The Analytical Engine only supports integer arithmetic, so we perform fixed- point arithmetic with 20 decimal places. A fixed-point multiplication (resp. division) thus requires a logical right- (resp. left-) shift after (resp. before) the equivalent integer operation. This caveat is described here:

To fit within the storage limitations of the Analytical Engine, arrays are 'packed' by storing five real numbers in each register. Since an Analytical Engine register can store a 50-digit (signed) integer, this allows 10 digits (each unsigned) for each real number:

aaaaaaaaaa bbbbbbbbbb cccccccccc dddddddddd eeeeeeeeee

We store a real number x as int((50 + x) * (10 ^ 8)), allowing us to store numbers with 8 decimal places of precision. The admissible range for packed real numbers is thus [-50.00000000, 49.99999999], which is more than adequate for our purposes. Additional shifts are thus required for converting between packed (8 d.p.) and unpacked (20 d.p.) representations.

For our activation function, we use:

sigma(x) = (x + 2x^3) / (1 + abs(x + 2x^3))

which is a reasonably accurate piecewise-rational approximation to tanh(x). The need for it being piecewise-rational is that only addition, subtraction, multiplication and division are allowed operations on the Analytical Engine. Using a more accurate approximation to tanh(x) does not give any perceptible improvement, and would be slower to compute.

Code generation

The Analytical Engine only supports fixed relative jumps, each of which can be either forward or backward, conditional or unconditional. This makes any long machine-readable program impossibly unmaintainable and incomprehensible for humans. Consequently, we place various levels of abstraction between the abstract neural network and the direct implementation.

The lowest intermediate level of abstraction is support for jumping to named labels. The scripts goto2cc.py replaces jumps such as:

jmp mainloop

and

cjmp error_handler

to labels declared (betwixt a period and a colon) in the following manner:

.error_handler:

with an equivalent combinatorial card (http://fourmilab.ch/babbage/cards.html) such as CB?73. This removes the need to track line numbers manually.

Additionally, jsr2goto.py implements a call-stack (in registers R014 through R029) to facilitate nested subroutine calls and returns. Since combinatorial cards have fixed destinations, the script generates huge binary trees of branches to implement the ret (return) instruction.

Despite its name, jsr2goto.py also translates other instructions, such as memcpy, and expands single arithmetic instructions such as:

R001 = R002 * R003

to the equivalent sequence of operation and variable cards:

*
L002
L003
S001

Moreover, either one of the input operands is allowed to be a literal:

R007 = R007 + 15

in which case the constant 15 is loaded onto the register R001 by means of a number card. Consequently, any use of this instruction will destroy the value register R001; in general, it is safest to avoid using that register.

Certain subroutines are useful as callable entities in multiple programs; we deem collections of subroutines 'libraries' and store them in the libs subdirectory. Any libraries should be concatenated to the end of your code before running jsr2goto.py and goto2cc.py (in that order). We include several libraries, namely:

  • libs/matrix.xcf : rudimentary linear algebra functions
  • libs/packing.xcf : routines for packing and unpacking real numbers
  • libs/sigmoid.xcf : implementation of the activation function
  • libs/random.xcf : generates uniform and Gaussian random variables via LCG

Additionally, the layers of the convolutional neural network are generated by the script lcon.py (which was actually the most time-consuming part of the project, requiring many hours to write). Data is prepared for the Analytical Engine by means of another script, dcon.py.

The highest level of abstraction is provided by suite.sh, which invokes all of the above and wraps it into a complete program (or 'cardfile') which, when run on the Analytical Engine, will produce a human-readable progress report detailing its operation (cf. example.out for a sample output).

Recommended reading

These three resources are definitely worth perusing:

The first of these is a thoroughly enjoyable comic book exploring an alternate reality in which the Analytical Engine was actually constructed. Additionally, there are plenty of informative footnotes giving juicy details about Lovelace, Babbage, and the complex interplay between our Victorian protagonists. I would certainly recommend acquiring a copy if you're interested in the history of the Analytical Engine (which you invariably should be!).

Second is a free online book by Michael Nielsen, which is almost certainly the best hands-on introduction to the subject of neural networks and deep learning. It gives a detailed and accessible introduction to how neural networks are structured, the details of stochastic gradient descent and backpropagation, and a brief introduction to convolutional neural networks. The book comes with examples written in Python, which are rather more readable and easy to modify than the Analytical Engine code featured in this repository.

The third resource is the website of John Walker, who created aes, the Java Analytical Engine emulator used in this project. It has a vast menagerie of examples and instructions on how to prepare programs for the Analytical Engine. There are also historical documents, including Ada Lovelace's fabled Notes on the Analytical Engine.

Further resources (on Babbage and Lovelace)

Sydney Padua (who wrote the comic book mentioned in the 'Recommended reading' section) is also giving a talk in Cambridge on 2016-02-09:

There is currently a plan to finally build the Analytical Engine:

The homepage for the Cambridge Lovelace Hackathon is here (watch this space for updates about possible subsequent events):

We're keen to broaden our geographical scope, so please get in touch if you're from elsewhere! (We're seriously considering Palo Alto as a possible location, given that a few people there were interested in our 2015 Cambridge Lovelace Hackathon.) Contact details are available from the above website.

Further resources (on deep learning)

Another great introduction to convolutional neural networks (presented as a three-part series of articles with plenty of pretty diagrams!) was written by Chris Olah and is available on his website:

In particular, it describes some more adventurous neural network architectures such as recurrent neural networks (where data can travel in loops rather than propagating through the network in a strict linear progression from input to output). Amusingly, this is very similar to the disparity between the Difference Engine and Babbage's earliest proto-design of the Analytical Engine (which was initially conceived as a Difference Engine looped back on itself, cf. ouroboros).

Acknowledgements

Special thanks go to the authors of the above resources (Sydney Padua, John Walker and Michael Nielsen), which were immeasurably useful for this project and the accompanying talk. Also, I would like to thank my fellow organisers and participants of the Cambridge Lovelace Hackathon (Tim Hutton, Simon Rawles, Jade-Amanda Laporte, Jardin-Alice Laporte, Tom Webster, Julian Brown et al.) where this idea was initially conceived. Finally, my gratitude goes to Matthew Ireland and Jesse Wang for hosting my talk at the Churchill Computer Science Talks series.

Licence

This work is licenced under the Creative Commons Attribution 4.0 licence CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). Informally, this means you can modify and use this work for any purpose, even commercially, as long as the author is credited.

Adam P. Goucher, 2016-02-01.