Major changes

Changed merge strategy from gather to pull and push
- Pull happens synchronously as soon as the rank has finished processing local tasks and stealing
  - Pull gets the tasks from its neighbours and merges it on reception to avoid memory bloating
- Push can happen sync/async depending on the compile-time flag (ASYNC_), push requests is handled by the helper thread.
  - Push in async adopts fire and forget strategy, a future resolution is determined on the basis of next_ queue being empty. This is implemented using atomic_flags
Added hypercube partitioner and simple hashing partitioner to BNSL problem
Non-Unique tasks are now reduced in async over b-tree by helper thread.
In bnsl_state.hpp operator == has been updated to check for equality of active_task. This guarantees the correct accumulation of active tasks in the root at the end of each superstep.

Minor changes

`vranks_' is initialized in the constructor of executor, It contains the ranks of all neighbours.
Changed lock/unlock to lock_guard where applicable
Avoiding locking in m_receive_message_head__ for some request_type which don't update the tokens
In impl.hpp kept one template version of add_to(Container& S, const T& t) such that it can accept any container types
In Cmake for profile flag added -fsanitize=leak"

In bit_utils.hpp fixed memory bloating issue
In bnsl_state.hpp operator == fixed from comparing tid to (score, active_tasks)
Move calling identity on gst_ at the start of superstep

Edited Mar 24, 2022 by Zainul Abideen Sayed