Draft: multifile data extractor function (!239) · Merge requests · Quantify / quantify-core

Adam Lawrence requested to merge multifile_extractor into develop Oct 12, 2021

Explanation of changes

This is a first draft of a multifile data extractor function, which can be used to extract data from multiple experimental datafiles and create a new dataset which can then be used for further analysis. The new dataset is saved in a new file under its own TUID.

This MR covers the first use case described in #262 (closed). Independent variables (settable instrument parameters) are extracted from the snapshot JSON files, and dependent variables (saved quantities of interest) are extracted from the quantities_of_interest JSON files. The independent variables become the xi coordinates in the quantify dataset and the dependent variables become the yi data variables. We also use the timestamp of the experiment as the dim_0 coordinate.

In the following example, we want to plot the T1, T2* and T2echo time of a qubit versus the applied flux. T1, T2echo and T2* are all measured in different experiments and saved in the quantities_of_interest file, while the flux is saved in the snapshot file. The multifile_data_extractor function extracts these quantities from the relevant files and compiles them all into a dataset.

We have not yet written any analysis code for this experiment (that is for another MR), but we can already run BasicAnalysis on this new dataset to plot the coherence times against both the flux and the timestamp.

I don't know why the x-axis label looks so weird in the timestamp plot, but I think this is to do with the way our quantify plotting functions handle timestamps. Maybe something to look at in another MR.

Motivation of changes

The data is extracted by the function

def multifile_data_extractor(
    t_start: str,
    t_stop: str,
    independent_vars: Dict[str, Dict[str, str]],
    dependent_vars: Dict[str, Dict[str, str]],
    name: str,
)

t_start and t_stop specify the start and end timestamps of the multifile analysis and name gives the name of the dataset to be created. Two dictionaries are used to specify the quantities that we want to extract from the saved datafiles. For every multifile analysis, we may need to extract many different quantities, and we need to tell the function which quantities to extract from which experiments and how to look them up. Currently we assume that instrument snapshot parameters go on the x-axis and quantities of interest on the y-axis, so these are specified in two separate dictionaries: independent_vars and dependent_vars (we may relax this assumption at some point, but this is a starting point).

The dependent_vars dictionary is used for the y-axis quantities of interest. The format is like this:

dependent_vars = {
        "t1": {
            "name": "T1",
            "long_name": r"$T_1$ decay time",
            "experiment": "T1",
            "analysis": "analysis_T1Analysis",
            "units": "s",
        },
        "t2_star": {
            "name": "T2*",
            "long_name": r"$T_2^*$ coherence time",
            "experiment": "Ramsey",
            "analysis": "analysis_RamseyAnalysis",
            "units": "s",
        },
        "t2_echo": {
            "name": "t2_echo",
            "long_name": r"$T_{2, \\mathrm{echo}}$ coherence time",
            "experiment": "Echo",
            "analysis": "analysis_EchoAnalysis",
            "units": "s",
        },
    }

The "experiment" key specifies which experiment files to look in to find the given quantity of interest. It is a string which will appear in the name of the experimental folder. So for example, Echo experiments will have names like 20210924-110026-187-91eec8-Echo experiment q3, so the function will only look in folders with the string "Echo" in their name to search for the t2_echo parameter. The "analysis" key specifies the name of the analysis sub-folder in which the quantities_of_interest file is stored, which can be different for different experiments.

The independent_vars dict specifies the instrumental parameters which are put on the x-axis. These will be QCoDeS parameters that are saved in the snapshot file. The format is like this:

independent_vars = {
        "flux": {
            "name": "flux",
            "long_name": "flux bias current",
            "instrument": "fluxcurrent",
            "parameter": "FBL_0",
            "units": "A",
        }
    }

In the snapshot file, the QCoDeS parameters are saved in JSON format under the name of the instrument and the parameter, so these names be specified in the dictionary. Unlike with quantities of interest, we assume that there is a saved value for these parameters in every experiment, so there is no need to include an "experiment" key to specify which experimental files to look in. Therefore, there will be a timestamp and independent_vars for every datapoint in the dataset, but the y-axis varibles will have NaNs wherever the quantity of interest does not exist, or was not able to be fitted properly.

Reading JSON files

When we read the quantities of interest file, the quantity of interest can be stored as many different datatypes, from simple numeric types to more complex things like Ufloats, which we use to specify quantities with error bars. These are all written to the JSON file using the QCoDeS NumpyJSONEncoder. Reading out floats is trivial, but reading out Ufloats, for example, requires knowledge of how the JSON file is encoded. Unfortunately, there is no decoder in QCoDeS to go with the NumpyJSONEncoder, so we cannot read out general data types. For now, I have included a hack to read out the nominal value of a ufloat, if this is possible. I have not included the uncertainty at this stage. The long-term solution is to write a JSON decoder, but I don't know when this will happen.

Draft: multifile data extractor function

Explanation of changes

Motivation of changes

Merge checklist

Merge request reports