Feature distributed merge and bnsl hypercube partitioner
Major changes
-
Changed merge strategy from
gather
topull and push
-
Pull happens synchronously as soon as the rank has finished processing local tasks and stealing
- Pull gets the tasks from its neighbours and merges it on reception to avoid memory bloating
-
Push can happen sync/async depending on the compile-time flag (
ASYNC_
), push requests is handled by the helper thread.-
Push in
async
adopts fire and forget strategy, afuture
resolution is determined on the basis ofnext_
queue being empty. This is implemented usingatomic_flags
-
Push in
-
Pull happens synchronously as soon as the rank has finished processing local tasks and stealing
-
Added hypercube partitioner and simple hashing partitioner to BNSL problem
-
Non-Unique tasks are now reduced in async over b-tree by helper thread.
-
In
bnsl_state.hpp
operator==
has been updated to check for equality of active_task. This guarantees the correct accumulation of active tasks in theroot
at the end of each superstep.
Minor changes
- `vranks_' is initialized in the constructor of executor, It contains the ranks of all neighbours.
- Changed lock/unlock to lock_guard where applicable
- Avoiding locking in
m_receive_message_head__
for somerequest_type
which don't update the tokens - In
impl.hpp
kept one template version ofadd_to(Container& S, const T& t)
such that it can accept any container types - In Cmake for profile flag added
-fsanitize=leak"
Bug fixes
- In
bit_utils.hpp
fixed memory bloating issue - In
bnsl_state.hpp
operator==
fixed from comparing tid to (score, active_tasks) - Move calling
identity
ongst_
at the start of superstep
Edited by Zainul Abideen Sayed