generic serialization pattern (!444) · Merge requests · Quantify / quantify-scheduler

Background

Currently there are two methods to json-serialize any object given to the json encoder. for a few specific quantify-scheduler classes, it takes the repr and dumps it as string to the json file. for the remainder of classes, it dumps the dict to json. There are two problems with the current approach:

to parse the repr back to a python object we need some sort of string-parsing. In general this string contains a bunch of json datastructures, but the json parser and string parser are sometimes incompatible and leads to very complicated bugs. Additionaly, the resulting json structure contains much of its data in string format which is hard to browse through with a json compatible tool. This MR makes no changes to this behavior, but solely provides an alternative serialization method explained below. removing the repr serialization method is something for a future MR, if deemed necessary
For every new objects that you want to json-serialize, you would have to overwrite its repr (even if it is used for a different function) and change the init method to accept arguments coming from the repr. This mixes code regarding object initialization, representation and serialization. A common piece of code present in our codebase currently is (taken from Schedule.__init__):

        if name is not None:
            self.data["name"] = name

        if data is not None:
            self.data.update(data)

also adding an extra argument to __init__ which overwrites all other input arguments. When increasing the number of serializable classes, most classes would obtain an entry like this.

This coding style has as a side-effect that Schedule is currently not json-serializable with the ScheduleJSONEncoder since it's repr is not enough to recreate it. These two issues combined are my motivation for this MR.

Changes and Motivation

In this MR, I have added a serialization method inspired by python copy and pickle behavior. When python classes are copied or pickled, a few different methods are tried to make the object stateless. When this fails, copying and pickling will fail. To influence this process, the python documentation suggests to change the __getstate__ and __setstate__ dunder methods.

https://docs.python.org/3/library/pickle.html#object.__getstate__

It seemed a natural idea to follow these guidlines also for JSON serialization (i.e. use the same dunder methods to let classes themselves figure out how to make themselves stateless).

the only issue with this is that we have to tell the JSON decoder which class was responsible for creating a specific entry, and therefore I propose to introduce the special "deserialization_type" entry in a dict. as the logic in the code shows, whenever "deserialization_type" is present in a decoded dict, we know it should be converted into a class with type "deserialization_type" and be given back the state given by the remainder of the dict. A useful design pattern for this is {"deserialization_type": "AcquisitionMetadata", "data":{...}}

Please note that when creating the new class of "deserialization_type", its __init__ method is not called. We create the class with __new__ and populate it with __setstate__ which could internally call __init__ if this is convenient, but can also chose its own method for setting all variables and defaults. an example of this is already in our codebase (again Schedule):

    def __setstate__(self, state):
        self.data = state
        for schedulable in self.schedulables.values():
            schedulable.schedule = weakref.proxy(self)

generic serialization pattern

Background

Changes and Motivation

Merge checklist

Merge request reports