Monitor provisioned machines on machine

We should do continuous cross-check between state of docker-machine and state of digital ocean.

I have seen in past occasion when we had larger amount of machines provisioned on Digital Ocean, than accounted in docker-machine locally.

This did happen, because of API problem on Digital Ocean part (failure to create a new machines, then a failure to remove them, as machines were in 422 created state).

We should write a script that would take:

  1. runner token,
  2. runner digital ocean API token,
  3. do API call to digital ocean to list all machines,
  4. check if machines returned by API do still exist locally,
  5. remove machines that are not in use.

We should run this script probably every one hour, on each of managers and try to create a report when we detect a situation that machines are removed.

We should probably do the same for all machines that are failing (machines without DockerID assigned in /root/.docker/machines/machines/<machine-name>/config.json), which are currently removed by /root/machines-operation.sh remove-failing script). If we would run that every hour it would basically delete problematic entries. In case of remove-failing we should ensure that we delete entries that are probably more than 1 hour old, as it is still possible that this machine is being created now, by docker-machine.

For solving problem 1. we could use doctl with something like this:

ls -1 /root/.docker/machines/machines > machines.txt

while read DID DNAME DIP DREST; do
	if grep -q "$DNAME" machines.txt; then
		continue
	fi

	echo "Removing $DNAME of $DID..."
	doctl compute droplet delete "$DID" &
done < <(doctl compute droplet list  | grep "\trunner-${runner_token_stripped_to_8_characters_from_config_toml})
wait

For solving problem 2., probably some script with find and -mtime to look for files that were modified more than 1 hour ago.

@maratkalibek @tmaczukin What do you think?