Ownership handoffs are not moving all data
Created by: szarsti
We have noticed that when rotating a box in a cluster (removing a box and adding new one in its place) we end up with much less disk usage.
After investigation of contents of data folder on box that have been just rotated I can actually confirm that there is a lot of mstore files that are supposed to be there missing, even though ddb-admin cluster status says that a node has reached its expected ring participation level.
I just run grep on error log, to see if there is anything relevant. I could found quite a few messages like these:
/var/log/ddb/error.log.4:2017-05-12 17:19:01.765 [error] <0.30286.919>@riak_core_handoff_sender:start_fold:301 ownership transfer of metric_vnode from 'dalmatinerdb@172.16.10.21' 890602560248518965780370444936484965102833893376 to 'dalmatinerdb@172.16.10.12' 890602560248518965780370444936484965102833893376 failed because of error:{badrecord,ho_acc} [{riak_core_handoff_sender,start_fold,5,[{file,"/data/jenkins/workspace/dalmatinerdb/_build/default/lib/riak_core/src/riak_core_handoff_sender.erl"},{line,223}]}]
and these
/var/log/ddb/error.log.3:2017-05-15 14:35:26.626 [error] <0.25835.1575>@riak_core_handoff_sender:start_fold:301 hinted transfer of metric_vnode from 'dalmatinerdb@172.16.10.21' 879184578706871286731904157180889004011771920384 to 'dalmatinerdb@172.16.10.16' 879184578706871286731904157180889004011771920384 failed because of exit:{shutdown,wrong_node} [{riak_core_handoff_sender,start_fold,5,[{file,"/data/jenkins/workspace/dalmatinerdb/_build/default/lib/riak_core/src/riak_core_handoff_sender.erl"},{line,142}]}