Skip to content

Draft: Resolve "Tango-Server crashes on Restart Command"

Mateusz C requested to merge 545-issue into develop

Closes #545

Lenghty description of the problem

Segfaults were caused by this call because at this point the DeviceClass* pointer was seriously corrupted. Earlier, size returned by this call was already negative and this iteration was a trainwreck.

The reason of this was that device_class pointer created in PyTango layer during server startup here was being destroyed (at the beginning of running RestartServer command) here by CppTango and python was unaware of this. In next step Python would "try" to delete created DeviceClass instances on this line. Normally that would cause a boost reference counter to go to zero and cause a segfault, because the underlying object was already destroyed... but python still was holding a reference hidden in _device_class_instance here so the garbage collection was not triggered. In the next step, python allocated a new DeviceClass instance and continued here. This assignment finally caused the reference counter of the first DeviceClass instance (already deleted) to go to 0, and boost triggered destructor. Because we had a new instance allocated exactly in this memory location, this didnt segfault, but instead it removed the DeviceClass instance newly allocated by python. From this point whatever operations on class_list[i] were just dangling in space.

We allocate DeviceClass instance in python and when CppTango deletes it, there's no simple way to inform python about this. Proposed solution is to wrap a delete class_list[i] calls in CppTango in a wrapper that can be overloaded in PyTango. It would then skip the deletion if the object was allocated in python layer. Proposed wrapper uses pointer to function.

If this solution is too ugly, we can investigate other possibilities:

  • Using intrusive_ptr instead of shared_ptr to manage the memory by boost. That is said to allow custom dealocation mechanisms. We could check if the object was already deleted and skip deletion then.
  • Maybe it's possible to reimplement DServer to work on shared_ptr instead of normal pointers and we could pass the shared pointer managed by boost to Cpp layer
  • (not working) I tried creating a virtual deleter method of DServer that I would override in boost wrapper, but it didn't work, because DServer is allocated in cpp here so it will not know about any overloads
Edited by Mateusz C

Merge request reports