Skip to content

Completely abolish job pickling.

Tristan Van Berkom requested to merge tristan/nuke-pickle-jobber into master

This commit completely abolishes pickling of jobs, taking a firm step backwards on #1021 until this can be done properly and completely.

Problems with the current approach:

  • Objects have to implement custom __getstate__ and __setstate__ methods
    • These methods have to manually keep track of variables which should be nullified upon pickling, easily falling out of sync with surrounding source code. To the point that we've added assert statements that the nullified members exist as class attributes.
    • If this were actually designed, we might have a base interface to derive from allowing classes to list the members which must be nullified, asserting that they actually exist after said object's __init__() method completes, and taking care of the __getstate__/__setstate__ methods automatically based on a list.
  • While some objects implement __getstate__ and sometimes __setstate__, others implement this callback called _get_child_pickle_state_for_jobbing() or something weirdly undocumented, tucked away deeply in the insides of _scheduler/jobs/jobpickler.py
    • It is unclear why an object should choose to implement this custom function instead of __getstate__
    • The _get_child_pickle_state_for_jobbing() (or something) is not an interface that is documented anywhere, instead one must go and understand jobpickler.py and try to figure out what that function should do and why it should be chosen instead of __getstate__ (still a mystery).
  • Some objects complain about being pickled, including Scheduler and Stream
    • Surely it makes sense these should not be pickled
    • The side-effect of triggering these errors is that tox hangs indefinitely forever
    • Even when running tox with the -s option passed to pytest, there is no way to ever see the error message without going and modifying src/buildstream/testing/runcli.py and disabling the capturing done there, otherwise no stderr or stdout will ever be displayed until the hanging process never returns, because it's hanging.
    • Assuming you've spent a day so far and got as far as printing this TypeError and understanding what is going on, the error is unhelpful.
      • This error prints a stack trace showing the object which got pickled, simply saying that the object should not be pickled.
      • This error does not print any helpful information about objects which refer to these "dont pickle" objects.

This feature is a part of unfinished #1021, but this feature is in itself unfinished, not providing any convenience in implementation (abstract class ?), any clarity of why __getstate__ methods are littered around (barring deep code research), any helpful error messages when things go wrong (maybe the errors could at least use gc.get_referrers() to concoct a helpful error message ?), or any clear documentation of how to use this, save for a lengthy comment hiding out in jobpickler.py (a comment which demands focus and deep understanding of what is going on; this is not concise documentation on how to use the interfaces).

I think at this point, having this feature in it's present state serves mostly as an obstacle to any progress and refactoring, although it would be great if someone wanted to own this feature, fix it, and eventually complete #1021.

Edited by Tristan Van Berkom

Merge request reports