Skip to content

Fixes #2714: contractions fail with SEGV when computing non linear scalars such as autodiff

Reference issue

#2714

What does this implement/fix?

This MR is intended to fix a bug in Tensor contractions when using non-linear scalars or scalars that need initialization/finalization. The issue affects contractions running on DefautlDevice and ThreadPoolDevice, causing segmentation faults and memory leaks.

It turns out that the current contraction code uses a raw memory chuck to allocate the operands, correctly deallocating this memory block after the computation. However, with exotic scalars such as Eigen::AutodiffScalar, each element is eventually a graph with pointers to other memory positions that need (or do not) be initialized/finalized before/after the contraction computation.

The raw memory approach is relevant for the sake of performance, providing an aligned memory that allows vectorization.

This MR fixes this issue by performing two additional steps:

  • initializing the scalars by explicitly calling their constructors
  • finalizing the scalars by explicitly invoking the destructor

It is noteworthy that these two steps shouldn't allocate/deallocate the memory by themselves. The device already allocated the memory. Thus, we cannot allocate it again. In the same way, we cannot deallocate the memory by ourselves since the device will deallocate it later on using the right aligning offset.

As a solution, the fix uses placement new and explicit destructor calls. These two actions are implemented as features in the TensorContractionKernel struct:

template <typename ResScalar, typename LhsScalar, typename RhsScalar,
    typename StorageIndex, typename OutputMapper, typename LhsMapper,
    typename RhsMapper>
struct TensorContractionKernel {
  // ...

  EIGEN_DEVICE_FUNC void initialize_block(BlockMemHandle block) {
    //...
  }

  template <typename Device>
  EIGEN_DEVICE_FUNC BlockMemHandle allocate(Device& d, LhsBlock* lhs_block, RhsBlock* rhs_block) {
    BlockMemHandle result = BlockMemAllocator::allocate(d, bm, bk, bn, lhs_block, rhs_block);
    initialize_block(result);
    return result;
  }

  template <typename Device>
  EIGEN_DEVICE_FUNC BlockMemHandle allocateSlices(
      Device& d, const StorageIndex num_lhs, const StorageIndex num_rhs,
      const StorageIndex num_slices, std::vector<LhsBlock>* lhs_blocks,
      std::vector<RhsBlock>* rhs_blocks) {

    BlockMemHandle result = BlockMemAllocator::allocateSlices(
        d, bm, bk, bn, num_lhs, num_rhs, num_slices, lhs_blocks, rhs_blocks);    
    initialize_block(result);
    return result;
  }

  EIGEN_DEVICE_FUNC void finalize_block(BlockMemHandle block) {
    // ...
  }

  template <typename Device>
  EIGEN_DEVICE_FUNC void deallocate(Device& d, BlockMemHandle handle) {
    finalize_block (handle);
    BlockMemAllocator::deallocate(d, handle);
  }

Tests

This MR includes two new tests: test_scalar_initialization (in cxx11_tensor_contraction.cpp) and test_multithread_contraction_with_scalar_initialization (in cxx11_tensor_thread_pool.cpp). Both tests use a simple Scalar InitializableScalar.

Additional information

  1. Contractions of raw scalars such as floats or doubles are not affected by the initialization/finalization methods because this fix checks for NumTraits<LhsScalar>::RequireInitialization.

  2. Before this MR and even using this MR, contractions of exotic scalars do not compile on GPU devices.

Edited by Luiz doleron

Merge request reports