GitLab Commit is coming up on August 3-4. Learn how to innovate together using GitLab, the DevOps platform. Register for free: gitlabcommitvirtual2021.com

README.md 9.83 KB
Newer Older
Adam P. Goucher's avatar
Adam P. Goucher committed
1
2
3
4
This program searches random initial configurations in Conway's Game
of Life and periodically uploads results to a remote server. You can
read more information about the distributed search at the following URL:

Adam P. Goucher's avatar
Adam P. Goucher committed
5
- https://catagolue.hatsya.com/
Adam P. Goucher's avatar
Adam P. Goucher committed
6
7
8
9
10
11
12
13
14

An automatic live Twitter feed of new discoveries found by the search
was established by Ivan Fomichev:

- https://twitter.com/conwaylife

There is also an automatically-updated summary page, with animations
of interesting objects and various charts:

15
- https://catagolue.hatsya.com/statistics
Adam P. Goucher's avatar
Adam P. Goucher committed
16
17
18

The search was originally performed by people running instances of
a Python script; this repository contains the source code for a C++
Adam P. Goucher's avatar
Adam P. Goucher committed
19
program which is 20 times faster. The prefix 'apg-' stands for _Ash
20
21
Pattern Generator_ and the suffix '-luxe' refers to the capabilities
vis-a-vis previous versions (apgmera, apgnano, and apgsearch).
Adam P. Goucher's avatar
Adam P. Goucher committed
22
23
24
25

Compilation and execution
=========================

26
**Note:** apgluxe can only run on **x86-64** machines. If you have an
Adam P. Goucher's avatar
Adam P. Goucher committed
27
28
29
ancient computer, then the recommended alternative is the Python
script.

30
31
This program is designed to be compiled using `gcc` or `clang`. If you
have one of these compilers installed, then building apgluxe is as
Adam P. Goucher's avatar
Adam P. Goucher committed
32
33
34
35
simple as running:

    bash recompile.sh

36
37
in the repository directory. If compilation succeeded, the last two
lines should resemble the following:
Adam P. Goucher's avatar
Adam P. Goucher committed
38

Adam P. Goucher's avatar
Adam P. Goucher committed
39
40
    apgluxe v5.0-ll2.2.0: Rule b3s23 is correctly configured.
    apgluxe v5.0-ll2.2.0: Symmetry C1 is correctly configured.
Adam P. Goucher's avatar
Adam P. Goucher committed
41
42
43

which means you are ready to run the program like so:

44
    ./apgluxe [OPTIONS]
Adam P. Goucher's avatar
Adam P. Goucher committed
45
46
47

The options may include, for example:

Adam P. Goucher's avatar
Adam P. Goucher committed
48
49
50
51
52
53
54
55
- `-k mypassword`      Upload soups where 'mypassword' is your key
- `-n 5000000`         Run 5000000 soups per upload
- `-p 4`               Parallelise across 4 threads
- `--rule b36s245`     Run the custom rule B36/S245
- `--symmetry D2_+1`   Run soups with odd bilateral symmetry
- `-L 1`               Save a local log of each haul
- `-t 1`               Disable uploading to Catagolue
- `-i 10`              Upload exactly 10 hauls before exiting
Adam P. Goucher's avatar
Adam P. Goucher committed
56
57
58
59
60
61

Example usage
-------------

This invocation will upload results every 20 million soups:

62
    ./apgluxe -n 20000000
Adam P. Goucher's avatar
Adam P. Goucher committed
63
64
65
66

If you want to upload soups non-anonymously, use the -k flag and
provide a valid payosha256 key. The correct syntax is as follows:

67
    ./apgluxe -n 20000000 -k mykey
Adam P. Goucher's avatar
Adam P. Goucher committed
68
69

where 'mykey' is replaced with your payosha256 key (available from
70
https://catagolue.hatsya.com/payosha256 -- note the case-sensitivity).
Adam P. Goucher's avatar
Adam P. Goucher committed
71
72
73
74
75
Omitting this parameter will cause soups to be uploaded anonymously.

If you have a quad-core computer and would prefer not to run four
separate instances, then use the -p command to parallelise:

76
    ./apgluxe -n 20000000 -k mykey -p 4
Adam P. Goucher's avatar
Adam P. Goucher committed
77

78
This will use C++11 multithreading to parallelise across 4 threads, thus
79
80
producing and uploading soups approximately four times more quickly. Note
that this does not work on Cygwin.
Adam P. Goucher's avatar
Adam P. Goucher committed
81

Adam P. Goucher's avatar
Adam P. Goucher committed
82
83
84
Installation
============

85
86
Linux / Mac OS X users
----------------------
Adam P. Goucher's avatar
Adam P. Goucher committed
87

88
Compiling and running apgluxe is easy, as explained above. To download
Adam P. Goucher's avatar
Adam P. Goucher committed
89
90
91
92
the source code, use the following command:

    git clone https://gitlab.com/apgoucher/apgmera.git

93
Then you can enter the directory and compile the search program using:
Adam P. Goucher's avatar
Adam P. Goucher committed
94

95
    cd apgmera
Adam P. Goucher's avatar
Adam P. Goucher committed
96
97
98
99
100
    ./recompile.sh

If the online repository is updated at all, you can update your local
copy in-place by running:

Adam P. Goucher's avatar
Adam P. Goucher committed
101
    git pull
Adam P. Goucher's avatar
Adam P. Goucher committed
102
103
104

in the repository directory.

Adam P. Goucher's avatar
Adam P. Goucher committed
105
106
107
108
Windows users (precompiled)
---------------------------

There is a precompiled Windows binary, only for `b3s23/C1`, available from
109
[here](https://catagolue.hatsya.com/binaries/apgluxe-windows-x86_64.exe).
Adam P. Goucher's avatar
Adam P. Goucher committed
110
When executed, it will prompt you for the haul size, number of CPUs to use,
111
and your [payosha256 key](https://catagolue.hatsya.com/payosha256). For
Adam P. Goucher's avatar
Adam P. Goucher committed
112
113
114
finer control, it can be run from the Command Prompt with any combination
of the options mentioned in the Example Usage above.

Adam P. Goucher's avatar
Adam P. Goucher committed
115
116
117
118
119
Compiling from source has the advantage of allowing other rules and
symmetries to be explored. Moreover, it allows certain optimisations to
be applied to specifically target the machine you're using, conferring
a marginal speed boost.

120
121
Windows users (pre-Windows 10, Cygwin)
--------------------------------------
Adam P. Goucher's avatar
Adam P. Goucher committed
122

123
Install Cygwin64 (from https://cygwin.com), ensuring that the following
Adam P. Goucher's avatar
Adam P. Goucher committed
124
125
126
127
are checked in the list of plugins to install:

 - git
 - make
Adam P. Goucher's avatar
Adam P. Goucher committed
128
 - gcc-g++
129
 - python (2 or 3)
Adam P. Goucher's avatar
Adam P. Goucher committed
130
131
132
133

Open a Cygwin terminal, which will behave identically to a Linux terminal
but run inside Windows. This reduces your problem to the above case.

Adam P. Goucher's avatar
Adam P. Goucher committed
134
135
136
137
If you get the error `stoll is not a member of std`, then you are using an
old version of GCC. Run the Cygwin setup program to ensure that gcc-g++ is
updated.

138
139
Note that the `-p` option for parallelisation does not work in Cygwin.

140
141
142
143
144
145
146
147
Windows 10 users
----------------

Even though the Cygwin64 solution above will work perfectly on Windows 10,
[one user](https://gitlab.com/hedgepiggy) noted that it does not fully
utilise the processor. Instead, you are encouraged to use WSL bash as
described [here](https://gitlab.com/apgoucher/apgmera/issues/2).

148
149
150
151
152
153
154
155
156
157
It seems that the order of magnitude difference quoted above is on the
extreme side; other users report [a 20 percent difference][1] between
WSL, VirtualBox, and Cygwin64 (in descending order of speed).

[1]: http://conwaylife.com/forums/viewtopic.php?f=7&t=3049&p=61174#p61174

Moreover, these comparisons were performed with the old threading model
(OpenMP threads), whereas apgluxe has subsequently migrated to pure C++11
threads for increased cross-platform support.

Adam P. Goucher's avatar
Adam P. Goucher committed
158
159
160
161
Speed boosts
============

There are several compilation flags that can be used for accelerated
162
searching.
Adam P. Goucher's avatar
Adam P. Goucher committed
163

164
165
GPU searching
-------------
Adam P. Goucher's avatar
Adam P. Goucher committed
166
167
168
169
170
171
172

If you have an NVIDIA GPU with at least 1.5 GB of memory, then it can be
used as a 'preprocessor' which discards uninteresting soups and delegates
the interesting soups to the CPU search program. Compilation uses:

    ./recompile.sh --cuda

173
Note that the program will upload to a different census (**b3s23/G1** instead
Adam P. Goucher's avatar
Adam P. Goucher committed
174
175
176
of **b3s23/C1**) as the process of discarding uninteresting soups heavily
distorts the census results. Work is in progress to allow the GPU to census
soups itself, thereby allowing CUDA-accelerated searching of **b3s23/C1**.
Adam P. Goucher's avatar
Adam P. Goucher committed
177

178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
It will default to using the whole GPU (device 0, unless you explicitly
change `CUDA_VISIBLE_DEVICES` as explained in the next section) as a soup
preprocessor and 8 CPU threads to census the interesting soups. If you
have a very powerful GPU and it is under-utilised, you may want to increase
the number of CPU threads using the `-p` flag:

    ./apgluxe -n 1000000000 -p 12 -k mykey

This is particularly pertinent for the NVIDIA Volta V100 and RTX 2080 Ti
GPUs, which can each manage 1 080 000 soups per second if sufficiently many
CPU threads are being used. The Ampere A100 should theoretically be able to
manage considerably more than that, provided the number of threads is high
enough.

You can see the CPU usage using `htop` and the GPU usage using:

    watch -n 0.1 nvidia-smi

If the GPU utilisation is significantly below 100% for a significant amount
of time, then increasing the number of threads is recommended.

Multi-GPU searching
-------------------

Each process can only utilise a single GPU. The `CUDA_VISIBLE_DEVICES`
environment variable allows you to select the GPU to use (this is a standard
part of the CUDA runtime, and not a feature of apgsearch specifically). For
example, to run a process on GPU 3, use:

    CUDA_VISIBLE_DEVICES=3 ./apgluxe [OPTIONS]

This requires you to have firstly compiled with `./recompile.sh --cuda` as
described in the previous section.

To search on multiple GPUs, run one process on each GPU. You might want to
create a Bash script resembling the following (with one line per device) so
that you can conveniently run a process per GPU:

    #!/bin/bash
    KEY="insert_your_key_here"
    CPU_THREADS=8
    CUDA_VISIBLE_DEVICES=0 ./apgluxe -p "$CPU_THREADS" -n 1000000000 -k "$KEY" &
    CUDA_VISIBLE_DEVICES=1 ./apgluxe -p "$CPU_THREADS" -n 1100000000 -k "$KEY" &
    CUDA_VISIBLE_DEVICES=2 ./apgluxe -p "$CPU_THREADS" -n 1200000000 -k "$KEY" &
    CUDA_VISIBLE_DEVICES=3 ./apgluxe -p "$CPU_THREADS" -n 1300000000 -k "$KEY" &
    CUDA_VISIBLE_DEVICES=4 ./apgluxe -p "$CPU_THREADS" -n 1400000000 -k "$KEY" &
    CUDA_VISIBLE_DEVICES=5 ./apgluxe -p "$CPU_THREADS" -n 1500000000 -k "$KEY" &
    CUDA_VISIBLE_DEVICES=6 ./apgluxe -p "$CPU_THREADS" -n 1600000000 -k "$KEY" &
    CUDA_VISIBLE_DEVICES=7 ./apgluxe -p "$CPU_THREADS" -n 1700000000 -k "$KEY" &
    wait

**Warning:** The output to the terminal may look confusing and bizarre
because it contains the interleaved output from all of these processes.
This is to be expected and is no cause for concern.

Profile-guided optimisation (CPU only)
--------------------------------------

Profile-guided optimisation can be enabled with:

    ./recompile.sh --profile

This is only supported for the compilers GCC and Clang, rather than nvcc,
so is specific to CPU searching. The benefits are relatively mild.

243
244
245
246
Credits and licences
====================

The software `apgluxe` and `lifelib` are both written by Adam P. Goucher and
247
available under an MIT licence. Thanks go to Dave Greene, Tom Rokicki, 'Apple
Adam P. Goucher's avatar
Adam P. Goucher committed
248
249
Bottom', Darren Li, Arthur O'Dwyer, and Tod Hagan for contributions,
suggestions, testing, and feedback.
250
251

All third-party components are similarly free and open-source:
252
253
254
255
256
257
258
259
260
261
262
263

 - 'CRYSTALS-Dilithium: A Lattice-Based Digital Signature Scheme' is Licensed
   under Creative Commons License CC-BY 4.0;
 - The SHA3 (Keccak) hash function implementation, by Dr. Markku-Juhani O.
   Saarinen, is available under an MIT licence;
 - The SHA-256 hash function implementation, by Olivier Gay, is available
   under a BSD 3-clause licence;
 - The 'RSA Data Security, Inc. MD5 Message-Digest Algorithm' reference
   implementation can be copied, modified and used under the condition
   that the copyright notice is included;
 - The 'HappyHTTP' library, by Ben Campbell, can be copied, modified and
   used under the condition that the copyright notice is included.