GitLab's annual major release is around the corner. Along with a lot of new and exciting features, there will be a few breaking changes. Learn more here.

README.md 10.3 KB
Newer Older
Adam P. Goucher's avatar
Adam P. Goucher committed
1 2 3 4
This program searches random initial configurations in Conway's Game
of Life and periodically uploads results to a remote server. You can
read more information about the distributed search at the following URL:

Adam P. Goucher's avatar
Adam P. Goucher committed
5
- https://catagolue.hatsya.com/
Adam P. Goucher's avatar
Adam P. Goucher committed
6 7 8 9 10 11 12 13 14

An automatic live Twitter feed of new discoveries found by the search
was established by Ivan Fomichev:

- https://twitter.com/conwaylife

There is also an automatically-updated summary page, with animations
of interesting objects and various charts:

15
- https://catagolue.hatsya.com/statistics
Adam P. Goucher's avatar
Adam P. Goucher committed
16 17 18

The search was originally performed by people running instances of
a Python script; this repository contains the source code for a C++
Adam P. Goucher's avatar
Adam P. Goucher committed
19
program which is 20 times faster. The prefix 'apg-' stands for _Ash
20 21
Pattern Generator_ and the suffix '-luxe' refers to the capabilities
vis-a-vis previous versions (apgmera, apgnano, and apgsearch).
Adam P. Goucher's avatar
Adam P. Goucher committed
22 23 24 25

Compilation and execution
=========================

26
**Note:** apgluxe can only run on **x86-64** machines. If you have an
Adam P. Goucher's avatar
Adam P. Goucher committed
27 28 29
ancient computer, then the recommended alternative is the Python
script.

30 31
This program is designed to be compiled using `gcc` or `clang`. If you
have one of these compilers installed, then building apgluxe is as
Adam P. Goucher's avatar
Adam P. Goucher committed
32 33 34 35
simple as running:

    bash recompile.sh

36 37
in the repository directory. If compilation succeeded, the last two
lines should resemble the following:
Adam P. Goucher's avatar
Adam P. Goucher committed
38

Adam P. Goucher's avatar
Adam P. Goucher committed
39 40
    apgluxe v5.0-ll2.2.0: Rule b3s23 is correctly configured.
    apgluxe v5.0-ll2.2.0: Symmetry C1 is correctly configured.
Adam P. Goucher's avatar
Adam P. Goucher committed
41 42 43

which means you are ready to run the program like so:

44
    ./apgluxe [OPTIONS]
Adam P. Goucher's avatar
Adam P. Goucher committed
45 46 47

The options may include, for example:

Adam P. Goucher's avatar
Adam P. Goucher committed
48 49 50 51 52 53 54 55
- `-k mypassword`      Upload soups where 'mypassword' is your key
- `-n 5000000`         Run 5000000 soups per upload
- `-p 4`               Parallelise across 4 threads
- `--rule b36s245`     Run the custom rule B36/S245
- `--symmetry D2_+1`   Run soups with odd bilateral symmetry
- `-L 1`               Save a local log of each haul
- `-t 1`               Disable uploading to Catagolue
- `-i 10`              Upload exactly 10 hauls before exiting
Adam P. Goucher's avatar
Adam P. Goucher committed
56 57 58 59 60 61

Example usage
-------------

This invocation will upload results every 20 million soups:

62
    ./apgluxe -n 20000000
Adam P. Goucher's avatar
Adam P. Goucher committed
63 64 65 66

If you want to upload soups non-anonymously, use the -k flag and
provide a valid payosha256 key. The correct syntax is as follows:

67
    ./apgluxe -n 20000000 -k mykey
Adam P. Goucher's avatar
Adam P. Goucher committed
68 69

where 'mykey' is replaced with your payosha256 key (available from
70
https://catagolue.hatsya.com/payosha256 -- note the case-sensitivity).
Adam P. Goucher's avatar
Adam P. Goucher committed
71 72 73 74 75
Omitting this parameter will cause soups to be uploaded anonymously.

If you have a quad-core computer and would prefer not to run four
separate instances, then use the -p command to parallelise:

76
    ./apgluxe -n 20000000 -k mykey -p 4
Adam P. Goucher's avatar
Adam P. Goucher committed
77

78
This will use C++11 multithreading to parallelise across 4 threads, thus
79 80
producing and uploading soups approximately four times more quickly. Note
that this does not work on Cygwin.
Adam P. Goucher's avatar
Adam P. Goucher committed
81

Adam P. Goucher's avatar
Adam P. Goucher committed
82 83 84
Installation
============

85 86
Linux / Mac OS X users
----------------------
Adam P. Goucher's avatar
Adam P. Goucher committed
87

88
Compiling and running apgluxe is easy, as explained above. To download
Adam P. Goucher's avatar
Adam P. Goucher committed
89 90 91 92
the source code, use the following command:

    git clone https://gitlab.com/apgoucher/apgmera.git

93
Then you can enter the directory and compile the search program using:
Adam P. Goucher's avatar
Adam P. Goucher committed
94

95
    cd apgmera
Adam P. Goucher's avatar
Adam P. Goucher committed
96 97 98 99 100
    ./recompile.sh

If the online repository is updated at all, you can update your local
copy in-place by running:

Adam P. Goucher's avatar
Adam P. Goucher committed
101
    git pull
Adam P. Goucher's avatar
Adam P. Goucher committed
102 103 104

in the repository directory.

105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137
Apple M1 (Silicon) users
------------------------

Apple M1 chips use a different instruction set, namely ARMv8, so
emulation is required to use apgsearch. In particular, you need to
ensure that Rosetta 2 is installed and use the `arch` command as
follows:

    arch -x86_64 ./recompile.sh

Even though this uses emulation (traditionally slow), the Apple M1 chip
is surprisingly fast at emulating an x86_64 processor and runs apgsearch
at a comparable speed to a pre-Haswell processor.

Windows 10 users
----------------

Even though the Cygwin64 solution above will work perfectly on Windows 10,
[one user](https://gitlab.com/hedgepiggy) noted that it does not fully
utilise the processor. Instead, you are encouraged to use WSL bash as
described [here](https://gitlab.com/apgoucher/apgmera/issues/2).

It seems that the order of magnitude difference quoted above is on the
extreme side; other users report [a 20 percent difference][1] between
WSL, VirtualBox, and Cygwin64 (in descending order of speed).

[1]: http://conwaylife.com/forums/viewtopic.php?f=7&t=3049&p=61174#p61174

Moreover, these comparisons were performed with the old threading model
(OpenMP threads), whereas apgluxe has subsequently migrated to pure C++11
threads for increased cross-platform support.

Windows users (pre-Windows 10, precompiled)
Adam P. Goucher's avatar
Adam P. Goucher committed
138 139 140
---------------------------

There is a precompiled Windows binary, only for `b3s23/C1`, available from
141
[here](https://catagolue.hatsya.com/binaries/apgluxe-windows-x86_64.exe).
Adam P. Goucher's avatar
Adam P. Goucher committed
142
When executed, it will prompt you for the haul size, number of CPUs to use,
143
and your [payosha256 key](https://catagolue.hatsya.com/payosha256). For
Adam P. Goucher's avatar
Adam P. Goucher committed
144 145 146
finer control, it can be run from the Command Prompt with any combination
of the options mentioned in the Example Usage above.

Adam P. Goucher's avatar
Adam P. Goucher committed
147 148 149 150 151
Compiling from source has the advantage of allowing other rules and
symmetries to be explored. Moreover, it allows certain optimisations to
be applied to specifically target the machine you're using, conferring
a marginal speed boost.

152 153
Windows users (pre-Windows 10, Cygwin)
--------------------------------------
Adam P. Goucher's avatar
Adam P. Goucher committed
154

155
Install Cygwin64 (from https://cygwin.com), ensuring that the following
Adam P. Goucher's avatar
Adam P. Goucher committed
156 157 158 159
are checked in the list of plugins to install:

 - git
 - make
Adam P. Goucher's avatar
Adam P. Goucher committed
160
 - gcc-g++
161
 - python (2 or 3)
Adam P. Goucher's avatar
Adam P. Goucher committed
162 163 164 165

Open a Cygwin terminal, which will behave identically to a Linux terminal
but run inside Windows. This reduces your problem to the above case.

Adam P. Goucher's avatar
Adam P. Goucher committed
166 167 168 169
If you get the error `stoll is not a member of std`, then you are using an
old version of GCC. Run the Cygwin setup program to ensure that gcc-g++ is
updated.

170 171
Note that the `-p` option for parallelisation does not work in Cygwin.

Adam P. Goucher's avatar
Adam P. Goucher committed
172 173 174 175
Speed boosts
============

There are several compilation flags that can be used for accelerated
176
searching.
Adam P. Goucher's avatar
Adam P. Goucher committed
177

178 179
GPU searching
-------------
Adam P. Goucher's avatar
Adam P. Goucher committed
180 181 182 183 184 185 186

If you have an NVIDIA GPU with at least 1.5 GB of memory, then it can be
used as a 'preprocessor' which discards uninteresting soups and delegates
the interesting soups to the CPU search program. Compilation uses:

    ./recompile.sh --cuda

187
Note that the program will upload to a different census (**b3s23/G1** instead
Adam P. Goucher's avatar
Adam P. Goucher committed
188 189 190
of **b3s23/C1**) as the process of discarding uninteresting soups heavily
distorts the census results. Work is in progress to allow the GPU to census
soups itself, thereby allowing CUDA-accelerated searching of **b3s23/C1**.
Adam P. Goucher's avatar
Adam P. Goucher committed
191

192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256
It will default to using the whole GPU (device 0, unless you explicitly
change `CUDA_VISIBLE_DEVICES` as explained in the next section) as a soup
preprocessor and 8 CPU threads to census the interesting soups. If you
have a very powerful GPU and it is under-utilised, you may want to increase
the number of CPU threads using the `-p` flag:

    ./apgluxe -n 1000000000 -p 12 -k mykey

This is particularly pertinent for the NVIDIA Volta V100 and RTX 2080 Ti
GPUs, which can each manage 1 080 000 soups per second if sufficiently many
CPU threads are being used. The Ampere A100 should theoretically be able to
manage considerably more than that, provided the number of threads is high
enough.

You can see the CPU usage using `htop` and the GPU usage using:

    watch -n 0.1 nvidia-smi

If the GPU utilisation is significantly below 100% for a significant amount
of time, then increasing the number of threads is recommended.

Multi-GPU searching
-------------------

Each process can only utilise a single GPU. The `CUDA_VISIBLE_DEVICES`
environment variable allows you to select the GPU to use (this is a standard
part of the CUDA runtime, and not a feature of apgsearch specifically). For
example, to run a process on GPU 3, use:

    CUDA_VISIBLE_DEVICES=3 ./apgluxe [OPTIONS]

This requires you to have firstly compiled with `./recompile.sh --cuda` as
described in the previous section.

To search on multiple GPUs, run one process on each GPU. You might want to
create a Bash script resembling the following (with one line per device) so
that you can conveniently run a process per GPU:

    #!/bin/bash
    KEY="insert_your_key_here"
    CPU_THREADS=8
    CUDA_VISIBLE_DEVICES=0 ./apgluxe -p "$CPU_THREADS" -n 1000000000 -k "$KEY" &
    CUDA_VISIBLE_DEVICES=1 ./apgluxe -p "$CPU_THREADS" -n 1100000000 -k "$KEY" &
    CUDA_VISIBLE_DEVICES=2 ./apgluxe -p "$CPU_THREADS" -n 1200000000 -k "$KEY" &
    CUDA_VISIBLE_DEVICES=3 ./apgluxe -p "$CPU_THREADS" -n 1300000000 -k "$KEY" &
    CUDA_VISIBLE_DEVICES=4 ./apgluxe -p "$CPU_THREADS" -n 1400000000 -k "$KEY" &
    CUDA_VISIBLE_DEVICES=5 ./apgluxe -p "$CPU_THREADS" -n 1500000000 -k "$KEY" &
    CUDA_VISIBLE_DEVICES=6 ./apgluxe -p "$CPU_THREADS" -n 1600000000 -k "$KEY" &
    CUDA_VISIBLE_DEVICES=7 ./apgluxe -p "$CPU_THREADS" -n 1700000000 -k "$KEY" &
    wait

**Warning:** The output to the terminal may look confusing and bizarre
because it contains the interleaved output from all of these processes.
This is to be expected and is no cause for concern.

Profile-guided optimisation (CPU only)
--------------------------------------

Profile-guided optimisation can be enabled with:

    ./recompile.sh --profile

This is only supported for the compilers GCC and Clang, rather than nvcc,
so is specific to CPU searching. The benefits are relatively mild.

257 258 259 260
Credits and licences
====================

The software `apgluxe` and `lifelib` are both written by Adam P. Goucher and
261
available under an MIT licence. Thanks go to Dave Greene, Tom Rokicki, 'Apple
Adam P. Goucher's avatar
Adam P. Goucher committed
262 263
Bottom', Darren Li, Arthur O'Dwyer, and Tod Hagan for contributions,
suggestions, testing, and feedback.
264 265

All third-party components are similarly free and open-source:
266 267 268 269 270 271 272 273 274 275 276 277

 - 'CRYSTALS-Dilithium: A Lattice-Based Digital Signature Scheme' is Licensed
   under Creative Commons License CC-BY 4.0;
 - The SHA3 (Keccak) hash function implementation, by Dr. Markku-Juhani O.
   Saarinen, is available under an MIT licence;
 - The SHA-256 hash function implementation, by Olivier Gay, is available
   under a BSD 3-clause licence;
 - The 'RSA Data Security, Inc. MD5 Message-Digest Algorithm' reference
   implementation can be copied, modified and used under the condition
   that the copyright notice is included;
 - The 'HappyHTTP' library, by Ben Campbell, can be copied, modified and
   used under the condition that the copyright notice is included.