generic serialization pattern
Background
Currently there are two methods to json-serialize any object given to the json encoder.
for a few specific quantify-scheduler classes, it takes the repr
and dumps it as string to the json file.
for the remainder of classes, it dumps the dict to json.
There are two problems with the current approach:
-
to parse the
repr
back to a python object we need some sort of string-parsing. In general this string contains a bunch of json datastructures, but the json parser and string parser are sometimes incompatible and leads to very complicated bugs. Additionaly, the resulting json structure contains much of its data in string format which is hard to browse through with a json compatible tool. This MR makes no changes to this behavior, but solely provides an alternative serialization method explained below. removing therepr
serialization method is something for a future MR, if deemed necessary -
For every new objects that you want to json-serialize, you would have to overwrite its repr (even if it is used for a different function) and change the init method to accept arguments coming from the
repr
. This mixes code regarding object initialization, representation and serialization. A common piece of code present in our codebase currently is (taken fromSchedule.__init__
):
if name is not None:
self.data["name"] = name
if data is not None:
self.data.update(data)
also adding an extra argument to __init__
which overwrites all other input arguments.
When increasing the number of serializable classes, most classes would obtain an entry like this.
This coding style has as a side-effect that Schedule
is currently not json-serializable with the ScheduleJSONEncoder
since it's repr is not enough to recreate it.
These two issues combined are my motivation for this MR.
Changes and Motivation
In this MR, I have added a serialization method inspired by python copy and pickle behavior. When python classes are copied or pickled, a few different methods are tried to make the object stateless. When this fails, copying and pickling will fail. To influence this process, the python documentation suggests to change the __getstate__
and __setstate__
dunder methods.
https://docs.python.org/3/library/pickle.html#object.__getstate__
It seemed a natural idea to follow these guidlines also for JSON serialization (i.e. use the same dunder methods to let classes themselves figure out how to make themselves stateless).
the only issue with this is that we have to tell the JSON decoder which class was responsible for creating a specific entry, and therefore I propose to introduce the special "deserialization_type" entry in a dict.
as the logic in the code shows, whenever "deserialization_type" is present in a decoded dict, we know it should be converted into a class with type "deserialization_type" and be given back the state given by the remainder of the dict.
A useful design pattern for this is
{"deserialization_type": "AcquisitionMetadata", "data":{...}}
Please note that when creating the new class of "deserialization_type", its __init__
method is not called. We create the class with __new__
and populate it with __setstate__
which could internally call __init__
if this is convenient, but can also chose its own method for setting all variables and defaults.
an example of this is already in our codebase (again Schedule
):
def __setstate__(self, state):
self.data = state
for schedulable in self.schedulables.values():
schedulable.schedule = weakref.proxy(self)
Merge checklist
See also merge request guidelines
-
Merge request has been reviewed and approved by a project maintainer. -
Merge request contains a clear description of the proposed changes and the issue it addresses. -
Merge request made onto appropriate branch (main for most MRs). -
New code is fully tested. -
New code is documented and docstrings use numpydoc format. -
CHANGELOG.rst
andAUTHORS.rst
have been updated (when applicable). -
CI pipelines pass -
pre-commit run --all-files --hook-stage commit
passes (gitlab-ci), - test suite passes (gitlab-ci),
- no degradation in code-coverage (codacy),
- no (serious) new pylint code quality issues introduced (codacy),
- documentation builds successfully (CI and readthedocs),
-
windows tests pass (manually triggered by maintainers before merging).
-
For reference, the issues workflow is described in the contribution guidelines.