Make analysis work for datasets with different columns
Description
Make the coordinates and data variables to analyze in a dataset configurable in the analysis class.
Motivation
This is based on a discussion with @philomathie.
Currently the analysis framework is set up to use the names of the different coordinates e.g., "xi" and data variables "yi" to determine what to perform an analysis on.
This is useful when doing e.g., a Rabi analysis, and the name of the variable "y0" might either be "magn", or "F(1)", or "counts". By using the "generic" xi and yi column names it is possible to make the analysis work independent of the names of the variables.
However, this also makes it impossible to use the same analysis when the data is stored in a different column.
As an example, consider a T1 experiment that is simultaneously performed on multiple qubits. We would want to run the same (default) T1 analysis on each of the individual qubits (columns y0, to yn). This should be as simple as simply running the same analysis multiple time and specifying the desired column that contains the data. Currently this is not possible because
- It is not possible for the user to specify the column of the data to analyze.
- The data will overwrite itself in the data structure as the analysis class is identical.
I have flagged this issue as "needs refinement" because the solution to this problem is not clear yet.
@philomathie
Original discussion withI'm digging into the analysis class just now - it looks very comprehensive - but as I am writing my own analysis scripts I see that a utility function might make things a lot easier. As it is right now, all the analysis code is written under the assumption that data that is taken and stored will always be in a set order. For instance, in the resonator analysis class:
assert self.dataset_raw["y1"].attrs["units"] == "deg"
S21 = self.dataset_raw["y0"] * np.cos(
np.deg2rad(self.dataset_raw["y1"])
) + 1j * self.dataset_raw["y0"] * np.sin(np.deg2rad(self.dataset_raw["y1"]))
self.dataset["S21"] = S21
self.dataset["S21"].attrs["name"] = "S21"
This might make sense for some datasets, but more complex datasets with more variables that don't have a 'sensible' order might benefit from a utility function where we can request a specific y_array from a dataset by name. As far as I can see this currently does not exist. This would then mean that analysis routines could be written to analyse the relevant data stored at any position. To take a simple example, I am now writing a model to extract the resistance by measuring current as a function of voltage. The analysis is trivial: R = V/I, but most likely I will have many y arrays, and there is no obvious reason why I should be placed at y0, or y5. It seems a rather arbitrary restriction that could be remedied by the piece of utility code I described. (edited)
Hi James, excellent point The same issue happens if/when we would want to analyze a m> any-qubit simultaneous T1 experiment. In that case you don’t want to need a different analysis class for every column of data. We may want to think a bit about how we resolve this. There’s another issue as well btw, there are many equivalent representations of the same data. Depending on how people extract the data, you either have magn/phase, avg I, Q, single-shot I,Q, populations (based on normalized mean data) or populations based on thresholded shots. The latter is a slightly different issue, but quite similar Let’s create an issue on this, I don’t think we have one on this yet.
Yes, I can see a future where because the analysis routines are hard coded to expect different types of data in different columns, it might be hard to translate analysis routines between experiments if people record the data slightly differently.