Implement basic DataContainer
Description
As discussed in the MC use case (#113 (closed)) the ensembles in the Monte Carlo module should own a container that handles the data produced during an MC run. These data should be sufficient to repeat (data provenance) and restart the run. This means that the data container (DataContainer) class should handle
- meta data (run when, possibly by whom and where, name of ensemble, etc)
- run information (including the initial settings associated with the ensemble, atomic structure, set up of prng, etc)
- data as a function of MC step
- observables and parameters of the ensemble, such as energy, concentration
- observables generated by observers, such as short-range order parameters, configurations
The class should be added to the mchammer/data_container.py module and has to provide functions that enable e.g., initialization of an Ensemble class
class BaseEnsemble:
def __init__(self, atoms, ...):
...
self.data = DataContainer(..., atoms)
class DataContainer:
def __init__(self, atoms, ...):
...
self.__structure = atoms.copy()
@property
def structure(self):
return self.__structure.copy()
class SomeEnsemble(BaseEnsemble):
def __init__(self, ...):
...
# call base class constructor
self.data.add_parameter('temperature', 300.0)
self.data.add_parameter('chemical-potential-difference', -0.5)
self.data.add_observable('energy', float)
# the duplication of temperature is intentional since temperature
# is both a parameter that is needed for restarting and an observable
# that could be included in the output stream
self.data.add_observable('temperature', float)
for obs in self.observers:
# obs.property_type would probably be `list`
self.data.add_observable(obs.tag, obs.property_type)
During an MC run one would need to add data to the DataContainer object
self.data.append(self.mcstep, 'energy', energy)
if self.mcstep % self.minimum_interval == 0:
for obs in self.observers:
if self.mcstep % obs.interval == 0:
self.data.append(self.mcstep, obs.tag, obs.get_observable(...))
Note that ensemble parameter such as temperature or chemical potential would usually not be appended at every MC cycle but only when the they are explicitly being changed, e.g.,
class SomeEnsemble(BaseEnsemble):
@x.setter
def temperature(self, temperature):
self.temperature = temperature
self.data.append(self.mcstep, 'temperature', temperature)
During analysis one would require the following functions
dc = DataContainer(...)
print(dc.parameters)
>> OrderedDict([('temperature', 300.0), ('chemical-potential-difference', -0.5)])
print(dc.observables)
>> OrderedDict([('energy', float), ('temperature', float), ('sro', list), ...])
data = dc.get_data(['mcstep', 'energy', 'temperature', 'sro1'],
interval=..., filter=..., fill_missing=False)
print(data)
>> [[0, -100.0, 500.0, 0.1],
[10, -99.1, None, 0.2],
[20, -98.7, None, 0.15],
...
[500, -97.8, 200.0, -0.2]]
[510, -98.1, None, -0.22]]
...
[1000, -99.8, None, -0.4]]
The fill_missing option affects how missing elements (None in the example above) are treated. Setting the option to True should lead to [500.0, 500.0, 500.0, 500.0, ..., 200.0, 200.0, 200.0, ... 200.0] for the temperature column in the example above.
Notes
- ensure that fields (parameter, observable names) are not overwritten/doubly defined
- use assertions wherever possible/appropriate
Sub-tasks
-
define interface and functions (using pass) -
implement data structure (using pandas) -
add complete unit tests
Please note that read/write and restart functionalities are assigned to separate issues.
Demonstration
- tests pass
- doc strings complete