tidy in the config phase causes useless service restarts
I've just seen errors that were caused by puppet running, removing old job files, and then restarting ganeti services because of that.
I had issued a command from the cluster master when that happened and so the command failed because of the service restart and listing the instances (which was done in quick succession via the same script that issued the command that failed) showed:
Instance Hypervisor OS Primary_node Status Memory
formation-puppet3.koumbit.net xen-pvm debootstrap+buster vuvu.koumbit.net ERROR_nodedown ?
It's annoying that this is necessary but apparently we still need to clean out files older than one month in /var/lib/ganeti/queue
otherwise the files just pile up in there ad vitam aeternam.
The easiest fix would be to create a new phase, maybe called cleanup
, that would be ordered after the service.
...but actually the tidy resource is kind of annoying: on each puppet run you get an info-level message saying that tidy is cleaning things up even though it's not doing anything.
so I think I'd rather change how the old job files are cleaned out. We could change that up to a cronjob that finds old files and deletes them (and then I'd output something in syslog if some files were cleaned out).