witness_node program is crashing on active SON node (Mainnet)

Description:

I have an active SON on Mainnet. After setting up the server, installing/configuring Peerplays and a Bitcoin node, the witness_node program syncs all the blocks up to current. Then when my SON is scheduled, it tries to create "son down" proposals for all the other SONs. Most of the time these new proposals fail. Sometimes the proposals succeed in being created. The SON will operate for a few minutes and then get suddenly disconnected/crash (I can't exactly tell what's happening). Either way, the witness_node program ends.

The first problem I found was that my server time was off. I was getting a p2p disconnect because I was proposing blocks with a timestamp that was "in the future". I installed and configured NTP which solved that problem. But now it's crashing for some other reason.

Relevant info:

Server details

  • Amazon AWS EC2 instance (r6g.large)
    • 2 CPUs, 16 GB RAM, 150 GB storage, 10 Gigabit network
    • arm64 architecture
    • static public IP
  • Inbound ports for P2P, RPC, HTTP, HTTPS, and the Bitcoin node are open.
  • Outbound ports are open for all outbound traffic.

Software details

  • Peerplays version 1.5.13
    • built on the machine from source
  • Bitcoin node (Bitcoin Core) version 0.21.1
    • downloaded & extracted .tar.gz from bitcoin.org

Configuration details

  • Peerplays is set up as a seed node (opened p2p), API node (opened RPC), and SON node
  • Peerplays private key in config.ini is the SON account's active private key
  • SON's signing key is the SON account's active key
  • son-wallet is set up with a bitcoin pubkey and private key

Logs

Here's the end of my default.log when the program ends:

2021-08-18T02:24:42.400 th_a:invoke handle_block         handle_block ] Got block: #42704062 time: 2021-08-18T02:24:42 latency: 400 ms from: jburr-blocks  irreversible: 42704052 (-10)                 application.cpp:495
2021-08-18T02:24:42.901   th_a:SON Processing       son_processing ] Scheduled SON: 1.33.3 Now: 2021-08-18T02:24:43                     peerplays_sidechain_plugin.cpp:398
2021-08-18T02:24:45.381 th_a:invoke handle_block         handle_block ] Got block: #42704063 time: 2021-08-18T02:24:45 latency: 381 ms from: robert-hedler  irreversible: 42704053 (-10)                        application.cpp:495
2021-08-18T02:24:45.882   th_a:SON Processing       son_processing ] Scheduled SON: 1.33.4 Now: 2021-08-18T02:24:46                     peerplays_sidechain_plugin.cpp:398
2021-08-18T02:24:48.518 th_a:invoke handle_block         handle_block ] Got block: #42704064 time: 2021-08-18T02:24:48 latency: 518 ms from: eifos-witness  irreversible: 42704056 (-8)                 application.cpp:495
2021-08-18T02:24:49.019   th_a:SON Processing       son_processing ] Scheduled SON: 1.33.0 Now: 2021-08-18T02:24:49                     peerplays_sidechain_plugin.cpp:398
2021-08-18T02:24:54.401 th_a:invoke handle_block         handle_block ] Got block: #42704065 time: 2021-08-18T02:24:54 latency: 401 ms from: hiltos-witness  irreversible: 42704057 (-8)                        application.cpp:495
2021-08-18T02:24:54.902   th_a:SON Processing       son_processing ] Scheduled SON: 1.33.9 Now: 2021-08-18T02:24:55                     peerplays_sidechain_plugin.cpp:398
2021-08-18T02:24:57.452 th_a:invoke handle_block         handle_block ] Got block: #42704066 time: 2021-08-18T02:24:57 latency: 452 ms from: royal-flush  irreversible: 42704058 (-8)                   application.cpp:495
2021-08-18T02:24:57.952   th_a:SON Processing       son_processing ] Scheduled SON: 1.33.7 Now: 2021-08-18T02:24:58                     peerplays_sidechain_plugin.cpp:398
2021-08-18T02:24:57.953   th_a:SON Processing create_son_down_prop ] Sending son down proposal for 1.33.0 from 1.33.7                   peerplays_sidechain_plugin.cpp:558
2021-08-18T02:24:57.953   th_a:SON Processing create_son_down_prop ] Sending son down proposal for 1.33.1 from 1.33.7                   peerplays_sidechain_plugin.cpp:558
2021-08-18T02:24:57.954   th_a:SON Processing create_son_down_prop ] Sending son down proposal for 1.33.2 from 1.33.7                   peerplays_sidechain_plugin.cpp:558
2021-08-18T02:24:57.955   th_a:SON Processing create_son_down_prop ] Sending son down proposal for 1.33.3 from 1.33.7                   peerplays_sidechain_plugin.cpp:558
2021-08-18T02:24:57.955   th_a:SON Processing create_son_down_prop ] Sending son down proposal for 1.33.4 from 1.33.7                   peerplays_sidechain_plugin.cpp:558
2021-08-18T02:24:57.956   th_a:SON Processing create_son_down_prop ] Sending son down proposal for 1.33.9 from 1.33.7                   peerplays_sidechain_plugin.cpp:558

Here's the end of my p2p.log:

2021-08-18T02:24:39.208 p2p:message read_loop on_connection_accept ] I don't know if I'm firewalled.  Sending a firewall check message to peer 104.225.216.198:9779                     node.cpp:2121
2021-08-18T02:24:39.245   p2p:connect_to_task           connect_to ] fatal: error connecting to peer 173.249.23.108:9777: 0 exception: unspecified
Connection refused
    {"message":"Connection refused"}
    asio  asio.cpp:60 error_handler                     peer_connection.cpp:264
2021-08-18T02:24:39.247 p2p:delayed_peer_deletion_task              destroy ] Unexpected exception from peer_connection's accept_or_connect_task : {"code":0,"name":"exception","message":"unspecified","stack":[{"context":{"level":"error","file":"asio.cpp","line":60,"method":"error_handler","hostname":"","thread_name":"asio","timestamp":"2021-08-18T02:24:39"},"format":"${message} ","data":{"message":"Connection refused"}}]}                       peer_connection.cpp:167
2021-08-18T02:24:39.252   p2p:connect_to_task           connect_to ] fatal: error connecting to peer 109.111.245.114:9777: 0 exception: unspecified
No route to host
    {"message":"No route to host"}
    asio  asio.cpp:60 error_handler                     peer_connection.cpp:264
2021-08-18T02:24:39.252 p2p:delayed_peer_deletion_task              destroy ] Unexpected exception from peer_connection's accept_or_connect_task : {"code":0,"name":"exception","message":"unspecified","stack":[{"context":{"level":"error","file":"asio.cpp","line":60,"method":"error_handler","hostname":"","thread_name":"asio","timestamp":"2021-08-18T02:24:39"},"format":"${message} ","data":{"message":"No route to host"}}]}                 peer_connection.cpp:167
2021-08-18T02:24:39.573 p2p:message read_loop            read_loop ] disconnected 11 eof_exception: End Of File
End of file
    {"message":"End of file"}
    asio  asio.cpp:39 operator()

    {"len":16}
    p2p  stcp_socket.cpp:117 readsome                   message_oriented_connection.cpp:218
2021-08-18T02:24:39.609 p2p:message read_loop on_check_firewall_re ] Firewall check we initiated has returned with result: connection_successful, endpoint = 18.116.153.207:9777                        node.cpp:3774
2021-08-18T02:24:45.132 p2p:terminate_inactive_connections_loop terminate_inactive_c ] Forcibly disconnecting from handshaking peer 88.198.13.39:9777 due to inactivity of at least 5 seconds                   node.cpp:1345
2021-08-18T02:24:45.132 p2p:terminate_inactive_connections_loop terminate_inactive_c ] Peer's negotiating status: connecting, bytes sent: 0, bytes received: 0                  node.cpp:1347
2021-08-18T02:24:45.132 p2p:terminate_inactive_connections_loop terminate_inactive_c ] Forcibly disconnecting from handshaking peer 172.17.0.2:39319 due to inactivity of at least 5 seconds                    node.cpp:1345
2021-08-18T02:24:45.132 p2p:terminate_inactive_connections_loop terminate_inactive_c ] Peer's negotiating status: connecting, bytes sent: 0, bytes received: 0                  node.cpp:1347
2021-08-18T02:24:45.133   p2p:connect_to_task           connect_to ] fatal: error connecting to peer 88.198.13.39:9777: 0 exception: unspecified
Operation canceled
    {"message":"Operation canceled"}
    asio  asio.cpp:60 error_handler                     peer_connection.cpp:264
2021-08-18T02:24:45.133 p2p:delayed_peer_deletion_task              destroy ] Unexpected exception from peer_connection's accept_or_connect_task : {"code":0,"name":"exception","message":"unspecified","stack":[{"context":{"level":"error","file":"asio.cpp","line":60,"method":"error_handler","hostname":"","thread_name":"asio","timestamp":"2021-08-18T02:24:45"},"format":"${message} ","data":{"message":"Operation canceled"}}]}                       peer_connection.cpp:167
2021-08-18T02:24:45.133   p2p:connect_to_task           connect_to ] fatal: error connecting to peer 172.17.0.2:39319: 0 exception: unspecified
Operation canceled
    {"message":"Operation canceled"}
    asio  asio.cpp:60 error_handler                     peer_connection.cpp:264
2021-08-18T02:24:45.133 p2p:delayed_peer_deletion_task              destroy ] Unexpected exception from peer_connection's accept_or_connect_task : {"code":0,"name":"exception","message":"unspecified","stack":[{"context":{"level":"error","file":"asio.cpp","line":60,"method":"error_handler","hostname":"","thread_name":"asio","timestamp":"2021-08-18T02:24:45"},"format":"${message} ","data":{"message":"Operation canceled"}}]}                       peer_connection.cpp:167
2021-08-18T02:24:55.133 p2p:terminate_inactive_connections_loop terminate_inactive_c ] Forcibly disconnecting from handshaking peer 95.216.8.243:42765 due to inactivity of at least 5 seconds                  node.cpp:1345
2021-08-18T02:24:55.133 p2p:terminate_inactive_connections_loop terminate_inactive_c ] Peer's negotiating status: connecting, bytes sent: 0, bytes received: 0                  node.cpp:1347
2021-08-18T02:24:55.133 p2p:terminate_inactive_connections_loop terminate_inactive_c ] Forcibly disconnecting from handshaking peer 195.154.83.79:3338 due to inactivity of at least 5 seconds                  node.cpp:1345
2021-08-18T02:24:55.133 p2p:terminate_inactive_connections_loop terminate_inactive_c ] Peer's negotiating status: connecting, bytes sent: 0, bytes received: 0                  node.cpp:1347
2021-08-18T02:24:55.133   p2p:connect_to_task           connect_to ] fatal: error connecting to peer 95.216.8.243:42765: 0 exception: unspecified
Operation canceled
    {"message":"Operation canceled"}
    asio  asio.cpp:60 error_handler                     peer_connection.cpp:264
2021-08-18T02:24:55.133   p2p:connect_to_task           connect_to ] fatal: error connecting to peer 195.154.83.79:3338: 0 exception: unspecified
Operation canceled
    {"message":"Operation canceled"}
    asio  asio.cpp:60 error_handler                     peer_connection.cpp:264
2021-08-18T02:24:55.133 p2p:delayed_peer_deletion_task              destroy ] Unexpected exception from peer_connection's accept_or_connect_task : {"code":0,"name":"exception","message":"unspecified","stack":[{"context":{"level":"error","file":"asio.cpp","line":60,"method":"error_handler","hostname":"","thread_name":"asio","timestamp":"2021-08-18T02:24:55"},"format":"${message} ","data":{"message":"Operation canceled"}}]}                       peer_connection.cpp:167
2021-08-18T02:24:55.133 p2p:delayed_peer_deletion_task              destroy ] Unexpected exception from peer_connection's accept_or_connect_task : {"code":0,"name":"exception","message":"unspecified","stack":[{"context":{"level":"error","file":"asio.cpp","line":60,"method":"error_handler","hostname":"","thread_name":"asio","timestamp":"2021-08-18T02:24:55"},"format":"${message} ","data":{"message":"Operation canceled"}}]}                       peer_connection.cpp:167

There are no entries in the rpc.log.

Expected results:

The witness_node program should continue to run. I don't think the SON should be producing "son down" proposals for the other SONs unless they truly are down.

Actual results:

The witness_node program ends after attempting to create/update proposals.