general.rst 6.09 KB
Newer Older
1
2
3
4
5
.. _general:

General information
===================

6
.. warning:: This documentation is not really complete (yet).
7

8
Throughout this documentation we assume that you are familiar with the theoretical background behind the scanning
9
transmission electron microscope (STEM) to some degree. Also, we assume that you have some knowledge about the
10
UNIX/Linux command line and parallelized computation. STEMsalabim is currently not intended to be run on a desktop
11
12
computer. While that is possible and works, the main purpose of the program is to be used in a highly parallelized
multi-computer environment.
13

14
We took great care of making STEMsalabim easy to install. You can find instructions at :ref:`installing`. However, if
15
16
17
18
19
20
21
you run into technical problems you should seek help from an administrator of your computer cluster first.

.. _simulation-structure:

Structure of a simulation
-------------------------

22
There essence of STEMsalabim is to model the interaction of a focused electron beam with a bunch of atoms, typically
23
24
25
in the form of a crystalline sample. Given the necessary input files, the simulation crunches numbers for some time,
after which all of the calculated results can be found in the output file. Please refer to :ref:`running` for notes
how to start a simulation.
26

27
28
Input files
~~~~~~~~~~~
29

30
All information about the specimen are listed in the :ref:`crystal-file`, which is one of the two required input files
31
for STEMsalabim. It contains each atom's species (element), coordinates, and `mean square displacement
32
33
<https://en.wikipedia.org/wiki/Mean_squared_displacement>`_ as it appears in the `Debye-Waller factors
<https://en.wikipedia.org/wiki/Debye%E2%80%93Waller_factor>`_.
34

35
36
37
In addition, you need to supply a :ref:`parameter-file` for each simulation, containing information about the
microscope, detector, and all required simulation parameters. All these parameters are given in a specific syntax in the
:ref:`parameter-file` that are always required for starting a STEMsalabim* simulation.
38

39
40
Output files
~~~~~~~~~~~~
41

42
The complete output of a STEMsalabim simulation is written to a `NetCDF
43
44
45
<https://www.unidata.ucar.edu/software/netcdf/>`_ file. NetCDF is a binary, hierarchical file format for scientific
data, based on `HDF5 <https://support.hdfgroup.org/HDF5/>`_. NetCDF/HDF5 allow us to compress the output data and store
it in machine-readable, organized format while still only having to deal with a single output file.
46
47

You can read more about the output file structure at :ref:`output-file`.
48
49
50
51
52
53

.. _parallelization-scheme:

Hybrid Parallelization model
----------------------------

54
STEMsalabim simulations is parallelized both via `POSIX threads <https://en.wikipedia.org/wiki/POSIX_Threads>`_
55
and via `message passing interface (MPI) <https://en.wikipedia.org/wiki/Message_Passing_Interface>`_. A typical
56
57
58
simulation will use both schemes at the same time: MPI is used for communication between the computing nodes, and
threads are used for intra-node parallelization, the usual multi-cpu/multi-core structure.

59
60
.. hint:: A high performance computation cluster is an array of many (equal) computing *nodes*. Typical highly-parallelized
          software uses more than one of the nodes for parallel computations. There is usually no memory that is
61
62
63
64
          shared between the nodes, so all information required for the management of parallel computing needs to be
          explicitely communicated between the processes on the different machines. The quasi-standard for that is
          the `message passing interface (MPI) <https://en.wikipedia.org/wiki/Message_Passing_Interface>`_.

65
66
Let us assume a simulation that runs on :math:`M` computers and each of them spawns :math:`N` threads.

67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
Depending on the simulation parameters chosen, STEMsalabim may need to loop through multiple frozen phonon configurations
and values of the probe defocus. The same simulation (with differently displaced atoms and different probe defocus) is
therefore typically run multiple times. There are three parallelization schemes implemented in STEMsalabim:

- When :math:`M == 1`, i.e., no MPI parallelization is used, all pixels (probe positions) are distributed among the
  :math:`N` threads and calculated in parallel.
- Each MPI processor calculates *all* pixels (probe positions) of its own frozen phonon / defocus configuration, i.e.,
  :math:`M` configurations are calculated in parallel. Each of the :math:`M` calculations splits its pixels between
  :math:`N` threads (each thread calculates one pixel at a time).

  This scheme makes sense when the total number of configurations (`probe.num_defoci` :math:`\times`
  `frozen_phonon.number_configurations`) is much larger than or divisible by :math:`M`.
- A single configuration is calculated at a time, and all the pixels are split between all :math:`M \times N` threads.
  In order to reduce the required MPI communication
  around, only the main thread of each of the  :math:`M` MPI processors communicates with the master thread. The master
  thread sends a *work package* containing some number of probe pixels to be calculated to an MPI process, which then
  carries out all the calculations in parallel on its :math:`N` threads. When a work package is finished, it requests another
  work package from the master MPI process until there is no work left. In parallel, the worker threads of the MPI process
  with rank 0 also work on emptying the work queue.

In MPI mode, each MPI process writes results to its own temporary file, and after each frozen lattice configuration the
results are merged. Merging is carried out sequentially by each individual MPI processor, to avoid race conditions.
The parameter :code:`output.tmp_dir` (see :ref:`parameter-file`) should be set to a directory that is local
90
to each MPI processor (e.g., :code:`/tmp`).
91
92
93
94
95

.. note:: Within one MPI processor, the threads can share their memory. As the main memory consumption comes from storing
          the weak phase objects of the slices in the multi-slice simulation, which don't change during the actual simulation,
          this greatly reduces memory usage as compared to MPI only parallelization. You should therefore always aim for
          hybrid parallelization!