Commit bed4f227 authored by Jan Oliver Oelerich's avatar Jan Oliver Oelerich

Added a few docstrings, extended changelog and changed documentation

parent e6825647
What's new
==========
STEMsalabim 4.0
---------------
March 9th, 2018
**IMPORTANT**
I am releasing this version as 4.0.0, but neither the input nor output files changed. The parameter `precision` has
become deprecated and there is a parameter `tmp-dir`. Please see the documentation.
- Removed option for double precision. When requested, this may be re-introduced, but it slowed down compilation times
and made the code significantly more complicated. The multislice algorithm with all its approximations, including the
scattering factor parametrization, is not precise enough to make the difference between single and double precision
significant.
- Improved the Wave class, so that some important parts can now be vectorized by the compiler.
- Introduced some more caches, so that performance could greatly be improved. STEMsalabim should now be about twice as
fast as before.
- Results of the MPI processors are now written to temporary files and merged after each configuration is finished.
This removes many MPI calls which tended to slow down the simulation. See the `--tmp-dir` parameter.
- Moved the `Element`, `Atom`, and `Scattering` classes to their own (isolated) library `libatomic`. This is easier to
maintain.
- Simplified MPI communication by getting rid of serialization of C++ objects into char arrays. This is too error-prone
anyway.
- Added compatibility with the Intel compiler chain. Tested with Intel 17. MKL for DFTs is not (yet) supported, please
keep using FFTW3.
- Some minor fixes and improvements.
STEMsalabim 3.1.0, 3.1.1, 3.1.2, 3.1.3, 3.1.4
---------------------------------------------
......
......@@ -65,17 +65,20 @@ threads are used for intra-node parallelization, the usual multi-cpu/multi-core
Let us assume a simulation that runs on :math:`M` computers and each of them spawns :math:`N` threads.
There is a single, special *master thread* (the thread 0 of the MPI process with rank 0) that orchestrates the simulation,
distributes work packages and does all the I/O. This is to avoid race conditions and prevent waiting times of the worker
process. All other threads (:math:`(M\times N)-1`) participate in the simulation.
i.e., manages and distributes work packages. All other threads (:math:`(M\times N)-1`) participate in the simulation. In
MPI mode, each MPI process writes results to its own temporary file, and after each frozen lattice configuration the
results are merged. Merging is carried out sequentially by each individual MPI processor, so that no race condition
is ran into. The parameter :code:`output.tmp_dir` (see :ref:`parameter-file`) should be set to a directory that is local
to each MPI processor (e.g., :code:`/tmp`).
A typical *STEMsalabim* simulation is composed of many independent multi-slice simulations that differ only in the
position of the scanning probe. Hence, parallelization is done on the level of these multi-slice simulations, with each
thread performing them independently from other threads. In order to reduce the number of MPI messages being sent
around, only the main thread of each of the :math:`M` MPI processors communicates with the master thread. The master
thread sends a *work package* containing some number of probe pixels to be calculated to an MPI process, which then
carries out all the calculations in parallel on its :math:`N` threads, and sends back the results when it is finished.
After the results are received by the master thread, it sends another work package to the MPI process until there is no
work left. In parallel, the worker threads of the MPI process with rank 0 also work on emptying the work queue.
carries out all the calculations in parallel on its :math:`N` threads. When a work package is finished, it requests another
work package from the master MPI process until there is no work left. In parallel, the worker threads of the MPI process
with rank 0 also work on emptying the work queue.
.. note:: Within one MPI processor, the threads can share their memory. As the main memory consumption comes from storing
the weak phase objects of the slices in the multi-slice simulation, which don't change during the actual simulation,
......
......@@ -27,7 +27,6 @@ simulation.
application: {
random_seed = 0; # the random seed. 0 -> generate
precision = "single"; # single or double precision
skip_simulation = false; # skip the actual multi-slice simulation (for debugging)
}
......@@ -39,12 +38,6 @@ simulation.
seed for that can be specified here. If set to 0, a random seed is generated by the program. For reproduction of
previous results, the seed can be set to a specific value.
**application.precision**
*string [default: `single`]*
Set to ``d`` or ``double`` or ``high`` to use double precision for the calculation. Note, that this increases
computer time, memory consumption, and output file size.
**application.skip_simulation**
*boolean [default: `false`]*
......@@ -63,6 +56,7 @@ Some general settings specific to this simulation.
bandwidth_limiting = true; # bandwidth limit the wave functions?
normalize_always = false; # normalize wave functions after each slice?
output_file = "out.nc"; # output file name
tmp_dir = "/tmp/"; # directory for temporary files.
output_compress = "false"; # compress output data. This reduces file sizes, but increases IO times.
}
......@@ -91,6 +85,15 @@ Some general settings specific to this simulation.
Path to the result NetCDF file. Please read :ref:`output-file` to learn about its format. Relative paths are
interpreted relative to the working directory of the simulation.
**simulation.tmp_dir**
*string [default: `folder of ouput file`]*
When MPI parallelized, all MPI processors will write their results to temporary binary files in this directory,
and only at the end of each frozen lattice configuration, the results are merged. We recommend using a local
directory here (local to each MPI processor) so that access is fast. By default, the directory of the output file
is used, which typically is **not local** but on a network file system. This can have a profound effect when CBED
results are collected, i.e., output becomes large.
**simulation.output_compress**
*boolean [default: `false`]*
......@@ -482,6 +485,11 @@ Command line arguments
Override the value of the **probe.defocus** setting. If specified exactly once, a single defocus is calculated.
If specified *exactly three times*, the functionality is similar to the **probe.defocus** parameter.
**--tmp-dir**
*string*
Override the value of the **output.tmp_dir** setting.
**--output-file, -o**
*string*
......
......@@ -90,11 +90,21 @@ namespace stemsalabim {
_cbed_intensities_stored = true;
}
/*!
* Stores intensities in the member vector given a pre-created buffer entry.
* @param in intensities as a vector of float
* @param buf memory buffer where the data is stored
*/
void storeAdfIntensities(memory::buffer::entry<float> & buf_entry) {
adf_intensities = buf_entry;
_adf_intensities_stored = true;
}
/*!
* Stores CBED intensities in the member vector given a pre-created buffer entry.
* @param in CBED intensities as a vector of float
* @param buf memory buffer where the data is stored
*/
void storeCBEDIntensities(memory::buffer::entry<float> & buf_entry) {
cbed_intensities = buf_entry;
_cbed_intensities_stored = true;
......@@ -236,14 +246,30 @@ namespace stemsalabim {
return _cbed_y_grid;
}
/*!
* Whether the slice with specified id is stored in the adf output or not.
* @param id the slice id
* @return true, when slice is stored, otherwise false
*/
bool adfStoreSlice(unsigned int id) const {
return _adf_store_slice[id];
}
/*!
* Whether the slice with specified id is stored in the cbed output or not.
* @param id the slice id
* @return true, when slice is stored, otherwise false
*/
bool cbedStoreSlice(unsigned int id) const {
return _cbed_store_slice[id];
}
/*!
* Index of the ADF bin (detector angle array) of k space indices kx and ky.
* @param ix k space index in x direction
* @param iy k space index in y direction
* @return index of the bin to store intensity in
*/
int adfBinIndex(unsigned int ix, unsigned int iy) const {
return _adf_bin_index[ix][iy];
}
......@@ -351,6 +377,11 @@ namespace stemsalabim {
return (unsigned int) algorithms::round_even(Params::getInstance().samplingDensity() * _crystal->sizeY());
}
/*!
* Return a wave object to multiply other waves with, that
* will bandwidth limit the wave if requested.
* @return bandwidth limit mask
*/
const Wave & getBandwidthMask() const {
return _bandwidth_limit_mask;
}
......
......@@ -67,6 +67,10 @@ namespace stemsalabim {
*/
void initNcFile(const std::shared_ptr<GridManager> &gridman, const std::shared_ptr<Crystal> &crystal);
/*!
* Initialize the memory buffers that are used when averaging results.
* @param gridman Reference to the GridManager required for writing some information.
*/
void initBuffers(const std::shared_ptr<GridManager> &gridman);
/*!
......@@ -104,16 +108,29 @@ namespace stemsalabim {
const std::shared_ptr<GridManager> &gridman, const ScanPoint &point,
std::shared_ptr<memory::buffer::number_buffer<float>> &cbuf);
/*!
* Write temporary results to a binary file (custom format). This is done in MPI mode.
* @param idefocus current defocus index
* @param iconf current FP configuration index
* @param point The pixel for which intensity values are to be written.
* @param ibuf The number buffer with the actual adf intensity values.
* @param cbuf The number buffer with the actual cbed intensity values.
*/
static void writeTemporaryResult(unsigned int idefocus, unsigned int iconf,
ScanPoint &point,
std::shared_ptr<memory::buffer::number_buffer<float>> &ibuf,
std::shared_ptr<memory::buffer::number_buffer<float>> &cbuf);
/*!
* Copy written temporary results from the binary files to the NC output file. (MPI mode)
* @param gridman reference to a GridManager
* @param ibuf The number buffer with the actual adf intensity values.
* @param cbuf The number buffer with the actual cbed intensity values.
*/
void copyFromTemporaryFile(const std::shared_ptr<GridManager> &gridman,
std::shared_ptr<memory::buffer::number_buffer<float>> &ibuf,
std::shared_ptr<memory::buffer::number_buffer<float>> &cbuf);
/*!
* Write the various grids to the output file.
* @param gridman reference to the GridManager
......
......@@ -159,7 +159,6 @@ namespace stemsalabim {
std::vector<std::shared_ptr<atomic::Atom>> _atoms;
std::vector<std::tuple<double, double, double>> _fields;
Wave _transmission_function;
std::vector<std::vector<double>> _field_cache;
};
}
......
......@@ -34,6 +34,11 @@ namespace stemsalabim { namespace atomic {
* @tparam float precision type, float or double
*/
/*!
* regular modified cylindrical Bessel function order 0
* @param x argument
* @return regular modified cylindrical Bessel function order 0 of x
*/
static double bessi0(double x) {
double ax, ans;
double y;
......@@ -62,6 +67,11 @@ namespace stemsalabim { namespace atomic {
return ans;
}
/*!
* irregular modified cylindrical Bessel function order 0
* @param x argument
* @return irregular modified cylindrical Bessel function order 0 of x
*/
static double bessk0(double x) {
double y, ans;
......@@ -87,6 +97,12 @@ namespace stemsalabim { namespace atomic {
class Scattering {
public:
/*!
* Singleton instance creation of the Scattering class
* @param cache_grating grating density (1/Angstrom) of the cache grid
* @param cutoff_radius cutoff radius (Angstrom) of the potentials
* @return singleton instance of the class.
*/
static Scattering &getInstance(double cache_grating = 72, double cutoff_radius = 3) {
static Scattering instance(cache_grating, cutoff_radius);
return instance;
......@@ -104,25 +120,31 @@ namespace stemsalabim { namespace atomic {
return grating_atompot / length_atompot;
}
/*!
* Get index of the potential vector given some distance
* @param distance distance
* @return index, when distance is valid, else -1.
*/
int getIndex(double distance) const {
if(distance > _cutoff)
if(fabs(distance) > _cutoff)
return -1;
return (int)round(fabs(distance) * _grating);
}
double getPotential(double distance, const std::shared_ptr<Element> &element) const {
if(distance > _cutoff)
return 0.0;
auto index = (unsigned int)round(fabs(distance) * _grating);
return _cache.at(element->symbol())[index];
}
/*!
* Return reference to the array of the cached potential for some element.
* @param element the element to return
* @return the potential
*/
const std::vector<double> & getPotential(const std::shared_ptr<Element> &element)const {
return _cache.at(element->symbol());
}
/*!
* Initialize (i.e., calculate and cache) the potential for element element
* @param element the element
*/
void initPotential(const std::shared_ptr<Element> &element) {
std::unique_lock<std::mutex> lck(_cache_mtx);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment