Data handling and acquisition flow (#5) · Epics · Quantify

Data handling and acquisition flow

This epic tracks the data handling and acquisition flow. [TOC] Quantify-scheduler should allow a user to specify a schedule, execute it, and return a dataset. Quantify-core is responsible for providing the structure of such a dataset and the utilities around it. To support the functionality in quantify-scheduler, a few related issues in quantify-core need to be addressed. ## 1\. Implementation plan for Quantify-core _- TBD_ ## 2\. Implementation plan for Quantify-scheduler Note: Contrary to the subsection title, this subsection describes the **interface** changes, and not the implementation. See implementation plan below, in a different subsection. This section describes the current proposition. It's subject to change, criticism is welcomed. Notes regarding the requirements: ### Acquisition channel modifications * `acq_channel` is allowed to be a string too, instead of the current way to allow only integer. (Agreed.) * The `acq_channel` is a device-level property. This property can be specified on device-level and circuit-level operations too in the schedule. (Agreed.) * On the circuit-level measure operation(s), the user is able to override the (default) `acq_channel` specified on the device element. If unspecified, it uses the acquisition channel from the device element. (For each measure operations on the device element (for example transmonic or flux measures there could be separate default `acq_channel`.) (Agreed.) ``` q0 = BasicTransmonElement("q0") q0.measure.acq_channel("channel_0") (Open question: default_acq_channel or acq_channel?) ``` * At schedule/operator creation, the qubit (device element) must be specified, and optionally the `acq_channel`. (Agreed.) ``` sched = Schedule("schedule_name") # Using acq_channel explicitly. sched.add(Measure(qubit="q0", acq_channel="channel_0")) # Using default acq_channel. sched.add(Measure(qubit="q0")) ``` ### Adding independents to the schedule The `acq_index` argument is removed from the measure/acquisition operations, and instead only the independents will be present. The independents are specified in the `coord` argument. (Agreed.) ``` sched = Schedule("schedule_name") sched.add(Measure(qubit="q0", acq_channel="channel_0", coord=dict(amp=1.0, freq=7.0))) ``` This `coord` argument is optional on the user interface. ### Dataset modification * `InstrumentCoordinator.retrieve_acquisitions` and `ScheduleGettable.get `both return the same data. (Agreed.) * The measured values are complex numbers (for binned and trace). For trigger count and thresholded acquisitions integers. (Agreed.) * ~~The return type is a dictionary of multiple `xarray.Dataset` (because of memory efficiency and convenience in independent variable naming).~~ * The return type is a `xarray.Dataset`. (Agreed, but still a little bit controversial.) * The returned `xarray.Dataset` can be one of the following, and it's up to the user to decide. (Agreed.) * In average bin mode: 1D array, in append bin mode: 2D array with repetitions extra dimension. (Efficient for sparse data.) * Multidimensional array. (Efficient for dense data.) * For now: only 1D is implemented, but later multidimensional data will be possible. (Agreed.) * When we have a 1D dataset, the `acq_index_...` in the dataset is the unique index of the acquisition. It adheres to the requirements for acquisition index. This is auto-generated by quantify. Note, the **independent variable names do not change**, that's entirely up to the user what their name is. The **subsequent data processing on this raw dataset should rely on the coordinates names** (independent variable names), and the dimension name should be considered as an implementation detail to the dataset. #### Simple one channel 1D sweep Simple sweep measurement of the acquisition channel `"ch_0"` on `"q0"` device element. The independent variable name is `"amp"`. The schedule creation by the user is the following. ``` amps = [0.0, 0.5, 1.0, 1.5, 2.0] schedule = Schedule(name="Simple 1D sweep", repetitions=1) for amp in amps: schedule.add(Measure(qubit="q0", acq_channel="ch_0", coord=dict(amp=amp))) ``` The returned data by quantify is the following. ``` ch_0 = [0.0, 0.2, 0.4, 0.6, 0.8] data_vars = dict( ch_0=(["acq_index_ch_0"], ch_0), ) coords = dict( amp=(["acq_index_ch_0"], amps), acq_index_ch_0=range(len(amps)), ) dataset = xarray.Dataset( data_vars=data_vars, coords=coords, ) dataset ``` ![image](/uploads/c62d17d4433beda28ebe7242b51210b1/image.png) #### Simple one channel 2D sweep Simple sweep measurement of the acquisition channel `"ch_0"` on `"q0"` device element. The independent variable names are `"amp"` and `"freq"`. The schedule creation by the user is the following. ``` amps = [0.0, 0.5, 1.0, 1.5, 2.0] freqs = [0.0, 30.0, 60.0] schedule = Schedule(name="Simple 2D sweep", repetitions=1) for amp in amps: for freq in freqs: schedule.add(Measure(qubit="q0", acq_channel="ch_0", coord=dict(amp=amp, freq=freq))) ``` The returned data by quantify is the following. ``` ch_0 = [0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8] data_vars = dict( ch_0=("acq_index_ch_0", ch_0), ) coords = dict( amp=("acq_index_ch_0", numpy.repeat(amps, len(freqs))), freq=("acq_index_ch_0", numpy.tile(freqs, len(amps))), acq_index_ch_0=range(len(freqs) * len(amps)), ) dataset = xarray.Dataset( data_vars=data_vars, coords=coords, ) dataset ``` ![image](/uploads/f7df2af6a7fbe3cbe7bd423f3a6479d8/image.png) #### 1D sweep with append mode multiple repetitions Sweep measurement of the acquisition channel `"ch_0"` on `"q0"` device element. The independent variable name is `"amp"`. The schedule is repeated 2 times. The schedule creation by the user is the following. ``` amps = [0.0, 0.5, 1.0, 1.5, 2.0] schedule = Schedule(name="Append mode repetitions 1D sweep", repetitions=2) for amp in amps: schedule.add(Measure(qubit="q0", acq_channel="ch_0", coord=dict(amp=amp), bin_mode=APPEND)) ``` The returned data by quantify is the following. ``` ch_0 = [[0. , 0.2, 0.4, 0.6, 0.8], [1. , 1.2, 1.4, 1.6, 1.8]] data_vars = dict( ch_0=(["repetition", "acq_index_ch_0"], ch_0), ) coords = dict( amp=(["repetition", "acq_index_ch_0"], [amps, amps]), acq_index_ch_0=range(len(amps)), repetition=[0, 1], ) dataset = xarray.Dataset( data_vars=data_vars, coords=coords, ) dataset ``` ![image](/uploads/df91099adda6f5cf3e6a94d779509a9b/image.png) #### Multiple separate dense data in average mode Multiple dense sweeps. The channel `"ch_0"` on `"q0"` device element is measured for `"amp"` independent variable. The channels `"ch_1"` on `"q0"` device element and `"ch_2"` on `"q1"` device element is measured for `"freq_a"` and `"freq_b"` independent variables in a dense way. **In this case, `"ch_1"` and `"ch_2"` share at least one independent variable name, so quantify generates the same dataset dimension name for both of them. This generated dimension name is different than the one for `"ch_0"`, because that does not share any independent variable name.** Note, that in the example `"freq_a"` and `"freq_b"` do not overlap exactly for all measurements. This means, that not every independent variable's values are the same for these channels. In this case quantify consider them as separate coordinates, even if one of the independent values are the same. The schedule creation by the user is the following. ``` amps = [0.0, 0.5, 1.0, 1.5, 2.0] freqs_a = [0.0, 30.0, 60.0] freqs_b = [10.0, 20.0] schedule = Schedule(name="Multiple separate dense data", repetitions=1) for amp in amps: schedule.add(Measure(qubit="q0", acq_channel="ch_0", coord=dict(amp=amp))) # Dense part of the acquisition for ch_1 and ch_2. for freq_a in freqs_a: for freq_b in freqs_b: schedule.add(Measure(qubit="q1", acq_channel="ch_1", coord=dict(freq_a=freq_a, freq_b=freq_b))) schedule.add(Measure(qubit="q2", acq_channel="ch_2", coord=dict(freq_a=freq_a, freq_b=freq_b))) # Deviations in both channels in the independents for ch_1 and ch_2. schedule.add(Measure(qubit="q1", acq_channel="ch_1", coord=dict(freq_a=100.0, freq_b=200.0))) schedule.add(Measure(qubit="q2", acq_channel="ch_2", coord=dict(freq_a=100.0, freq_b=300.0))) schedule.add(Measure(qubit="q1", acq_channel="ch_1", coord=dict(freq_a=400.0))) ``` The returned data by quantify is the following. ``` ch_0 = [0.0, 0.2, 0.4, 0.6, 0.8] data_ch_0 = xarray.DataArray( ch_0, dims=["acq_index_ch_0"], coords={ "acq_index_ch_0": range(len(amps)), "amp": ("acq_index_ch_0", amps) }, ) ch_1 = [0.0, 0.2, 1.0, 1.2, 2.0, 2.2, 10.0, numpy.nan, 40.0] data_ch_1 = xarray.DataArray( ch_1, dims=["acq_index_ch_1_ch_2"], coords={ "acq_index_ch_1_ch_2": range(len(freqs_a) * len(freqs_b) + 3), "freq_a": ("acq_index_ch_1_ch_2", numpy.append(numpy.repeat(freqs_a, len(freqs_b)), [100.0, 100.0, 400.0])), "freq_b": ("acq_index_ch_1_ch_2", numpy.append(numpy.tile(freqs_b, len(freqs_a)), [200.0, 300.0, numpy.nan])), }, ) ch_2 = [3.0, 3.2, 4.0, 4.2, 5.0, 5.2, numpy.nan, 20.0, numpy.nan] data_ch_2 = xarray.DataArray( ch_2, dims=["acq_index_ch_1_ch_2"], coords={ "acq_index_ch_1_ch_2": range(len(freqs_a) * len(freqs_b) + 3), "freq_a": ("acq_index_ch_1_ch_2", numpy.append(numpy.repeat(freqs_a, len(freqs_b)), [100.0, 100.0, 400.0])), "freq_b": ("acq_index_ch_1_ch_2", numpy.append(numpy.tile(freqs_b, len(freqs_a)), [200.0, 300.0, numpy.nan])), }, ) dataset = xarray.Dataset({"ch_0": data_ch_0, "ch_1": data_ch_1, "ch_2": data_ch_2}) dataset ``` ![image](/uploads/b24e7fde7f5c7c6bfad47e645e6ce9bd/image.png) Each channel (or `xarray.DataArray`) contains (and depends on) only the relevant independent variable names (or coordinate names). ``` dataset["ch_0"] ``` ![image](/uploads/5207d054f726b8b6e11441ee2e2b8d58/image.png) ``` dataset["ch_1"] ``` ![image](/uploads/31145f3a0292f790066d20fe46f8c558/image.png) ``` dataset["ch_2"] ``` ![image](/uploads/395351cc2be13679d7e500d5a9002b01/image.png) #### Multiple separate dense data in append mode This is the most complex example. It's exactly the same as a previous one, except the schedule is repeated 2 times in append mode. The schedule creation by the user is the following. ``` amps = [0.0, 0.5, 1.0, 1.5, 2.0] freqs_a = [0.0, 30.0, 60.0] freqs_b = [10.0, 20.0] # Note, repetitions=2 and all measurements are in append mode. schedule = Schedule(name="Multiple separate dense data", repetitions=2) for amp in amps: schedule.add(Measure(qubit="q0", acq_channel="ch_0", bin_mode=APPEND, coord=dict(amp=amp))) # Dense part of the acquisition for ch_1 and ch_2. for freq_a in freqs_a: for freq_b in freqs_b: schedule.add(Measure(qubit="q1", acq_channel="ch_1", bin_mode=APPEND, coord=dict(freq_a=freq_a, freq_b=freq_b))) schedule.add(Measure(qubit="q2", acq_channel="ch_2", bin_mode=APPEND, coord=dict(freq_a=freq_a, freq_b=freq_b))) # Deviations in both channels in the coordinate for ch_1 and ch_2. schedule.add(Measure(qubit="q1", acq_channel="ch_1", bin_mode=APPEND, coord=dict(freq_a=100.0, freq_b=200.0))) schedule.add(Measure(qubit="q2", acq_channel="ch_2", bin_mode=APPEND, coord=dict(freq_a=100.0, freq_b=300.0))) schedule.add(Measure(qubit="q1", acq_channel="ch_1", bin_mode=APPEND, coord=dict(freq_a=400.0))) ``` The returned data by quantify is the following. ``` ch_0 = [ [0.0, 0.2, 0.4, 0.6, 0.8], [1.0, 1.2, 1.4, 1.6, 1.8] ] data_ch_0 = xarray.DataArray( ch_0, dims=["repetition", "acq_index_ch_0"], coords={ "repetition": [0, 1], "acq_index_ch_0": range(len(amps)), "amp": ("acq_index_ch_0", amps) }, ) ch_1 = [ [0.0, 0.2, 1.0, 1.2, 2.0, 2.2, 10.0, numpy.nan, 40.0], [1.0, 1.2, 2.0, 2.2, 3.0, 3.2, 20.0, numpy.nan, 80.0] ] data_ch_1 = xarray.DataArray( ch_1, dims=["repetition", "acq_index_ch_1_ch_2"], coords={ "repetition": [0, 1], "acq_index_ch_1_ch_2": range(len(freqs_a) * len(freqs_b) + 3), "freq_a": ("acq_index_ch_1_ch_2", numpy.append(numpy.repeat(freqs_a, len(freqs_b)), [100.0, 100.0, 400.0])), "freq_b": ("acq_index_ch_1_ch_2", numpy.append(numpy.tile(freqs_b, len(freqs_a)), [200.0, 300.0, numpy.nan])), }, ) ch_2 = [ [3.0, 3.2, 4.0, 4.2, 5.0, 5.2, numpy.nan, 20.0, numpy.nan], [4.0, 4.2, 5.0, 5.2, 6.0, 6.2, numpy.nan, 44.0, numpy.nan] ] data_ch_2 = xarray.DataArray( ch_2, dims=["repetition", "acq_index_ch_1_ch_2"], coords={ "repetition": [0, 1], "acq_index_ch_1_ch_2": range(len(freqs_a) * len(freqs_b) + 3), "freq_a": ("acq_index_ch_1_ch_2", numpy.append(numpy.repeat(freqs_a, len(freqs_b)), [100.0, 100.0, 400.0])), "freq_b": ("acq_index_ch_1_ch_2", numpy.append(numpy.tile(freqs_b, len(freqs_a)), [200.0, 300.0, numpy.nan])), }, ) dataset = xarray.Dataset({"ch_0": data_ch_0, "ch_1": data_ch_1, "ch_2": data_ch_2}) dataset ``` ![image](/uploads/837a39e457685bd10b511b070f6041d7/image.png) #### Looped append measurements Looping is a relatively new construction, and append mode acquisitions with loops need to be handled. The following example shows features which have not been yet been implemented, and some were. The data format is the logical conclusion of all the other previously mentioned dataset formats. The schedule creation by the user is the following. ``` subschedule = Schedule("subschedule", repetitions=1) subschedule.add(Measure(acq_channel="ch_0", coord=dict(freq=freq), bin_mode=BinMode.APPEND)) schedule=Schedule("schedule") # For the sake of example (not yet implemented feature) # the `freq` is changing in the loop, with values [100, 200, 300]. # This is passed to the acquisition operations with the variable `freq`. schedule.add(LoopOperation(body=subschedule, repetitions=3)) ``` The returned data by quantify is the following. ``` data_ch_0 = xarray.DataArray( data=[[0.0, 0.1, 0.2]], dims=["repetition", "acq_index_ch_0"], coords={ "loop_repetition_ch_0": ("acq_index_ch_0", [0, 1, 2]), "freq": ("acq_index_ch_0", [100, 200, 300]), "repetition": [0], "acq_index_ch_0": [0, 1, 2], } ) dataset = xarray.Dataset({"ch_0": data_ch_0}) dataset ``` ![image](/uploads/26e32c32ddd65ed071f18e7bef4ff462/image.png) #### Multiple channels with looped append measurements As previously mentioned, if there are no common coords between channels, then the acquisition index dimension name is not shared. However, when they share at least one, the acquisition index dimension is the same. The following example shows this, together with what happens if the overall number of repetitions is not the same for these two acquisition channels. The schedule creation by the user is the following. ``` subsubschedule = Schedule("subsubschedule") subsubschedule.add(Measure(acq_channel="ch_1", coord=dict(freq=freq))) subschedule = Schedule("subschedule") subschedule.add(Measure(acq_channel="ch_0", coord=dict(freq=freq))) subschedule.add(LoopOperation(body=subsubschedule, repetitions=2)) schedule=Schedule("schedule") # For the sake of example (not yet implemented feature) # the `freq` is changing in the loop, with values [100, 200, 300]. # This is passed to the acquisition operations with the variable `freq`. schedule.add(LoopOperation(body=subschedule, repetitions=3)) ``` The returned data by quantify is the following. ``` data_ch_0 = xarray.DataArray( data=[[0.0, numpy.nan, 0.1, numpy.nan, 0.2, numpy.nan]], dims=["repetition", "acq_index_ch_0_ch_1"], coords={ "loop_repetition_ch_1": ("acq_index_ch_0_ch_1", [0, numpy.nan, 1, numpy.nan, 2, numpy.nan]), "freq": ("acq_index_ch_0_ch_1", [100, 100, 200, 200, 300, 300]), "repetition": [0], "acq_index_ch_0_ch_1": [0, 1, 2, 3, 4, 5], } ) data_ch_1 = xarray.DataArray( data=[[0.0, 0.1, 0.2, 0.3, 0.4, 0.5]], dims=["repetition", "acq_index_ch_0_ch_1"], coords={ "loop_repetition_ch_0": ("acq_index_ch_0_ch_1", [0, 1, 2, 3, 4, 5]), "freq": ("acq_index_ch_0_ch_1", [100, 100, 200, 200, 300, 300]), "repetition": [0], "acq_index_ch_0_ch_1": [0, 1, 2, 3, 4, 5], } ) dataset = xarray.Dataset({"ch_0": data_ch_0, "ch_1": data_ch_1}) dataset ``` ![image](/uploads/f6465a85d430c40b0c057e4ab51a2ae8/image.png) #### Trace acquisition The trace acquisition currently can only be used with average bin mode. If there is a trace acquisition on a channel, the compiler would not allow any other acquisition on that channel. It is the same behavior as now. Question: Should the compiler allow any non-`None` `coord` for this acquisition? If yes, the values should be repeated as a separate coordinate in the data. Pros: the interface is consistent, the compiler does not raise an error when the user tries to do that; the user is always allowed to not set any `coord`. Cons: we would allow very memory inefficient data to be stored by the user, if he/she chooses to do that. The returned data by quantify is the following. ``` ch_0 = [0.0, 0.2, 0.4, 0.6, 0.8] time = [0.01, 0.02, 0.03, 0.04, 0.05] data_vars = dict( ch_0=(["acq_index_ch_0"], ch_0), ) coords = dict( time=(["acq_index_ch_0"], time), acq_index_ch_0=range(len(time)), ) dataset = xarray.Dataset( data_vars=data_vars, coords=coords, ) dataset ``` ![image](/uploads/c0d0310dc2455cab5d93192ac4a6c4df/image.png) #### Loop-level dependent bin mode **WARNING** **This section is outdated, this will not be implemented in this form!** (Note: this is a relatively new proposed interface change, it was requested around 2024 March.) In some cases where there are multi-dimensional looping (for example a loop inside an other loop), the customers need to average the acquisitions along one loop, and need to append along the other loop. In this case, the `bin_mode` can be an iterable (for example a tuple or list). Note, the first element applies to the `Schedule.repetitions`, and the other subsequent elements apply to each level of nestedness of the loops. For acquisitions in loops with append bin mode, quantify automatically generates a new coordinate name. In the example, that is `loop_repetition_<n>`. (`loop_repetition_*` is not allowed in `coords`.) (To be compatible with previous schedules, and for convenience, the `bin_mode` argument can also be just one bin mode value. In this case, it applies to all levels of loops.) ``` sched = Schedule("schedule_name", repetitions=2) sched.add( LoopOperation( body=LoopOperation( body=Measure(qubit="q0", acq_channel="ch_0", bin_mode=(APPEND, APPEND, AVERAGE)), repetitions=4, ), repetitions=3, ) ) ``` The returned data by quantify is the following. ``` ch_0 = [[0. , 0.2, 0.4], [1. , 1.2, 1.4]] data_vars = dict( ch_0=(["repetition", "acq_index_ch_0"], ch_0), ) coords = dict( loop_repetition_0=(["repetition", "acq_index_ch_0"], [[0, 1, 2], [0, 1, 2]]), acq_index_ch_0=range(3), repetition=[0, 1], ) dataset = xarray.Dataset( data_vars=data_vars, coords=coords, ) dataset ``` ![acq_redesign_nested_loop_example](/uploads/f9c83e2bf2ec7641c9f28e221465ef8b/acq_redesign_nested_loop_example.png) ## 3\. Requirements of Acquisition Framework Goal of section: write down the requirements of the “acquisition framework” to make explicit what we are trying to implement and be able to use this when reviewing implementations. ### Definitions Goal of this section: describe the relevant concepts that we use when discussing acquisitions. Note, that these are not requirements, but concepts we use when describing acquisitions and the requirements. 1. **Experiment:** A procedure carried out under controlled conditions in order to make a discovery, test a hypothesis, or demonstrate a known fact. 2. **ExperimentDescription:** A description of the procedure that is carried out in an Experiment. A valid ExperimentDescription can consist of Settable(s), Gettable(s), instructions to determine the Setpoints, and predefined DataProcessing step(s). 3. **Schedule:** a description of a quantum program consisting of Operations applied to Resource(s) and containing precise timing information. 4. **Dataset** * Structured data with metadata (e.g., an xarray dataset). 5. **Raw Dataset** * A valid Dataset. * Data structure defined by what is returned by the Hardware abstraction layer (or Gettable). * Data entries labeled by Acquisition Channel and AcquisitionIndex. * _For a complete dataset, data entries can contain all independent variables._ 6. **Processed Dataset** * A valid Dataset. * Data structure defined by the Experiment that is performed, described by the ExperimentDescription. 7. **DataProcessing:** A predefined procedure of operations that can be performed on a dataset to return another Dataset (which may also include figures or quantities of interest). 8. **Analysis:** A custom procedure of operations that can be performed on a Dataset to better understand it. (N.B. a standardized Analysis can be used as DataProcessing). 9. **Acquisition:** An operation that can be added to a Schedule. An Acquisition consists of (at least) an AcquisitionProtocol, an AcquisitionChannel and an AcquisitionIndex (or AcquisitionTag). 10. **Acquisition protocol:** part of acquisition data processing pipeline. Each acquisition protocol should have a correspondent data schema defined and documented. 11. **Acquisition channel:** a stream of data that corresponds to a device element measured sequentially in an identical regime * In quantify-scheduler, an acquisition channel “normally corresponds to a qubit”. * Concept also makes sense without a qubit (e.g., standalone spectroscopy). * A qubit can have multiple acquisition channels. 12. **Acquisition Index:** is an identifier of an acquisition within a single repetition of a schedule, unique per acquisition channel (i.e. an index value occurs once per acquisition channel). 13. **Acquisition Tag**: a human readable label of an acquisition, unique per acquisition channel. An Acquisition Index is a valid Acquisition tag under this definition and can be used as a default. ### Requirements _Bold-faced definitions of concepts used in requirements can be found in the section “Definitions”._ 1. Quantify should allow a user to describe an **Experiment**, execute it, and get back a **ProcessedDataset**. 2. The structure of the **ProcessedDataset** should be how the user wants to represent the data and quantities of interest, (i.e., it should be clear to the user). 3. The structure of a **ProcessedDataset** is defined by the **Experiment** that is performed: i.e., a combination of Settables, Gettables, and **DataProcessing**. 4. Quantify should allow a user to specify a **Schedule**, execute it by passing it to the _InstrumentCoordinator_, and get back a **RawDataset**. 5. The structure of the **RawDataset** should be determined by how the _InstrumentCoordinator_ and _InstrumentCoordinatorComponents_ represent the data. 6. The shape and type of individual data entries (including units) in a **RawDataset** is determined by the **AcquisitionProtocols** of the **Acquisitions** specified in the **Schedule**. 7. There should be a clear (explicit) mapping between **Acquisitions** specified in a **Schedule** and data entries in a **RawDataset**. 8. The structure of the **RawDataset** and the **ProcessedDataset** must be predictable before the execution of the **Experiment** (predictable structure, but not the exact structure). 9. A user should be able to specify what independent variables (units and values) they want to associate to an **Acquisition** when creating a **Schedule**. 10. Internal (i.e., within Quantify) processing of the experimental data should be clear and transparent. 11. The _InstrumentCoordinator_ must always be able to combine/merge partial datasets returned by _InstrumentCoordinatorComponents_ into a **RawDataset** without conflicts. 12. **RawDataset** should be serializable and saveable to disk and loadable from disk; file-format should be an open format. ## 4\. High-level overview of work _Update @ereehuis 10-11-2023: This section is likely outdated but still provides a good high-level overview_ ### Quantify-core - Structured gettables: we need to provide a `Gettable` that is allowed to provide an xarray dataset with associated metadata. quantify-core#343 - Integration of structured gettables in the measurement control. quantify-core#344 - The new `Gettable` is responsible for executing a single iteration (be it returning a single value, or a batch of values), - The measurement control is responsible for concatenating these partial datasets of the structured gettable, writing to disk, sending it to the plotmon, etc. - It would be nice to include quantify-core#301 when integrating the structured gettable into the MeasurementControl. - Challenge here is to keep this backwards compatible. - Add support to the plotmon for the new datastructures, and allow clear failure modes for non-supported datasets ### Quantify-scheduler - Document/specify what an acquisition protocol is and how it impacts data structures. quantify-scheduler#81 - Specifying the return types of an acquisition protocol, including units and possible data shapes will be key. - Specify how a user can specify what independent variables (units and values) they want to associate to an acquisition when creating a schedule. quantify-scheduler#68 - a sensible default to just use an index variable when no independent variables are specified. - think about what the best UX is for when specifying this. - Map retrieved data to the right independent variables quantify-scheduler#362 - Support in the compilation process to add the metadata that allows the instrument coordinator to map/transform the raw arrays of individual instrument to the right independent variables. - this is currently done in a super hacky fashion with the acq-metadata that is extracted by counting the measurements in a schedule. - Requires specifying the layering on what the independent variables are - Instrument coordinator needs to return the right xarray dataset, this implies the instr.coord.components need to provide the right parts of the xarray dataset, and the coordinator needs to know how to combine these partial datasets. - Other issues to be resolved: - Poor API for data acquisition and data storage quantify-scheduler#192 - Undesired coupling between acquisition channels and indices in different parts of the code quantify-scheduler#304 ## 5\. Implementation for Quantify-scheduler The main high-level goal is to 1. generate a mapping from the user defined concepts (acq_channel, acq_index) to the hardware concepts at compilation, 2. and then when we retrieve the data from the hardware, we translate this hardware acquisition data to the acquisition data using this compiled mapping. There are two mappings. * Hardware independent mapping. This mapping maps each acquisition channel to * the acquisition protocol, * the bin mode, * acquisition index dimension name (for example acq_index_ch_0 in the example), and * the mapping from acquisition index to it’s coords. * Hardware dependent mapping. In case of Qblox it maps each Qblox cluster, module, sequencer and acq_channel and acq_index to * Qblox acquisition index (which is different than the acquisition index) and * Qblox acquisition bin. The main high-level logic is the following. 1. Generate the hardware independent mapping from the schedule, which generates two data. * The channel data, which contains the * the acquisition protocol, * the bin mode, * acquisition index dimension name, and * the mapping from acquisition index to it’s coords. * The schedulable to it’s acquisition index. 2. Then, the Qblox backend appends the acquisition index to each OpInfo using the schedulable to acq_index mapping, and along the lines, the acq_index is in each acquisition operations' strategy. 3. The Sequencer’s compile function generates the Qblox acquisition index and Qblox bin for each operation using a new class called QbloxAcquisitionBinManager. These generates values are called the allocated bins. QbloxAcquisitionBinManager stores these allocated bins for each acquisition. 4. When the compiler is finished creating the QASM program, it asks the QbloxAcquisitionBinManager the acquisition hardware mapping, and stores that in the CompiledSchedule. 5. When the user runs the CompiledSchedule, and retrieves the acquisitions, the InstrumentCoordinator maps each hardware acquisition data (the raw data retrieved from qblox-instruments) to it’s corresponding place in the xarray.Dataset using the acquisition hardware mapping, and appends the coords using the acquisition channel data. Below there is a data flow diagram. The rectanges are the datas, and the ellipses transform the data. ![Untitled_Diagram-1704376834108.drawio__2_](/uploads/3e855ae65697b592c1fb094e8dcca758/Untitled_Diagram-1704376834108.drawio__2_.png)

epic