00b3e2285d8927d5f39edfe40f2c32e26cd5498a 13.8 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
<p>Oh hey, I&#8217;m back. Been a while. Today, I want to share with you how I&#8217;m using <strong><a href="https://en.wikipedia.org/wiki/Systemd">systemd</a></strong> to start my Clojure applications on <strong><a href="http://matthiasnehlsen.com">matthiasnehlsen.com</a></strong>, and keep them alive, in case anything should go wrong. These are the applications managed this way:</p>

<ul>
<li><strong><a href="http://birdwatch.matthiasnehlsen.com">BirdWatch</a></strong>, an application for tweet stream analysis, see on <strong><a href="https://github.com/matthiasn/BirdWatch">GitHub</a></strong></li>
<li><strong><a href="http://redux-style.matthiasnehlsen.com/">redux-counter example</a></strong>, a sample application for my Clojure <strong><a href="https://leanpub.com/building-a-system-in-clojure">book</a></strong></li>
<li><strong><a href="http://systems-toolbox.matthiasnehlsen.com/">trailing mouse pointer example</a></strong>, another sample application for the book</li>
<li><strong><a href="http://inspect.matthiasnehlsen.com/">inspect</a></strong>, a demo for my <strong><a href="https://github.com/matthiasn/inspect">inspect library</a></strong>. This is will soon be replaced by a new version making sense of messages passed around in <strong><a href="https://github.com/matthiasn/systems-toolbox">systems-toolbox</a></strong> applications.</li>
</ul>


<!-- more -->


<p>Also, I&#8217;m using systemd to start up <strong><a href="http://sse-chat.matthiasnehlsen.com/">sse-chat</a></strong>, a <strong><a href="">Scala</a></strong> demo application which you can also find on <strong><a href="https://github.com/matthiasn/sse-chat">GitHub</a></strong>. However, this application is only started by systemd, but not restarted when anything goes wrong.</p>

<p>The background for this post is that I recently ordered a new <strong><a href="http://ark.intel.com/products/codename/37572/Skylake#@All">Skylake Intel® Xeon® E3-1275 v5</a></strong> based server at <strong><a href="https://www.hetzner.de/en/">Hetzner</a></strong>, and I felt it was finally time to retire the manual process startup approach I had used before. Servers should be updated as often as possible, but who does that often enough when it takes 10-15 minutes to wait for a reboot and then manually restart the processes? Certainly not me. So instead, all process startup should be automatic. Initially, I considered using <strong><a href="https://www.docker.com/">Docker</a></strong>, but regarding monitoring that the application is alive, and restarting it if not, systemd has the better story to offer. Also, I wasted way too much time on a Docker environment in my last client project, so I&#8217;m a little cured of the snake oil.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<p>So what I wanted was restarting the machine and have all services come up automatically. Also, I wanted to use the <strong>watchdog</strong> functionality, which expects the monitored applications to call systemd with a <strong>heartbeat</strong> message and restarts the application if that heartbeat wasn&#8217;t encountered for say 20 seconds or whatever else you define there. You can read all about this mechanism in this <strong><a href="http://0pointer.de/blog/projects/watchdog.html">blog post</a></strong> by one of the original authors of systemd.</p>

<p>While my applications were running rock solid for months in a row until I finally managed to update the server and restart it, it is certainly appealing from an operations perspective to have a mechanism in place that listens for a heartbeat and restarts a process when the heartbeat does not come as expected. So I thought this might be a good opportunity to write a small library that takes care of emitting said heartbeat when an application is monitored by systemd. You can find this library on GitHub <strong><a href="https://github.com/matthiasn/systemd-watchdog">here</a></strong>.</p>

<p>This library also happens to be a sweet opportunity to write a minimal <strong><a href="https://github.com/matthiasn/systems-toolbox">systems-toolbox</a></strong> system, with a scheduler component that emits messages every so often, and then calls systemd via <strong><a href="https://github.com/java-native-access/jna">JNA</a></strong>.</p>

<p>This is the entire library:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="kd">ns </span><span class="nv">matthiasn.systemd-watchdog.core</span>
</span><span class='line'>  <span class="p">(</span><span class="ss">:require</span> <span class="p">[</span><span class="nv">matthiasn.systems-toolbox.switchboard</span> <span class="ss">:as</span> <span class="nv">sb</span><span class="p">]</span>
</span><span class='line'>            <span class="p">[</span><span class="nv">matthiasn.systems-toolbox.scheduler</span> <span class="ss">:as</span> <span class="nv">sched</span><span class="p">])</span>
</span><span class='line'>  <span class="p">(</span><span class="ss">:import</span> <span class="p">[</span><span class="nv">info.faljse.SDNotify</span> <span class="nv">SDNotify</span><span class="p">]))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="kd">defn </span><span class="nv">start-watchdog!</span>
</span><span class='line'>  <span class="s">&quot;Call systemd&#39;s watchdog every so many milliseconds.</span>
</span><span class='line'><span class="s">   Requires the NOTIFY_SOCKET environment variable to be set, otherwise does</span>
</span><span class='line'><span class="s">   nothing. Fires up a minimal systems-toolbox system with two components:</span>
</span><span class='line'><span class="s">    * a scheduler component</span>
</span><span class='line'><span class="s">    * a component notifying systemd.</span>
</span><span class='line'><span class="s">   Then, the scheduler will emit messages every so often, and upon receiving,</span>
</span><span class='line'><span class="s">   the notifying component will call the sendWatchdog function.</span>
</span><span class='line'><span class="s">   Takes the timeout in milliseconds.&quot;</span>
</span><span class='line'>  <span class="p">[</span><span class="nv">timeout</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">when </span><span class="p">(</span><span class="nb">get </span><span class="p">(</span><span class="nf">System/getenv</span><span class="p">)</span> <span class="s">&quot;NOTIFY_SOCKET&quot;</span><span class="p">)</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">sb/send-mult-cmd</span>
</span><span class='line'>      <span class="p">(</span><span class="nf">sb/component</span> <span class="ss">:wd/switchboard</span><span class="p">)</span>
</span><span class='line'>      <span class="p">[[</span><span class="ss">:cmd/init-comp</span> <span class="p">(</span><span class="nf">sched/cmp-map</span> <span class="ss">:wd/scheduler-cmp</span><span class="p">)]</span>
</span><span class='line'>       <span class="p">[</span><span class="ss">:cmd/init-comp</span>
</span><span class='line'>        <span class="p">{</span><span class="ss">:cmp-id</span>      <span class="ss">:wd/notify-cmp</span>
</span><span class='line'>         <span class="ss">:handler-map</span> <span class="p">{</span><span class="ss">:wd/send</span> <span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">_</span><span class="p">]</span> <span class="p">(</span><span class="nf">SDNotify/sendWatchdog</span><span class="p">))}}]</span>
</span><span class='line'>       <span class="p">[</span><span class="ss">:cmd/send</span> <span class="p">{</span><span class="ss">:to</span>  <span class="ss">:wd/scheduler-cmp</span>
</span><span class='line'>                   <span class="ss">:msg</span> <span class="p">[</span><span class="ss">:cmd/schedule-new</span>
</span><span class='line'>                         <span class="p">{</span><span class="ss">:timeout</span> <span class="nv">timeout</span>
</span><span class='line'>                          <span class="ss">:message</span> <span class="p">[</span><span class="ss">:wd/send</span><span class="p">]</span>
</span><span class='line'>                          <span class="ss">:repeat</span>  <span class="nv">true</span><span class="p">}]}]</span>
</span></code></pre></td></tr></table></div></figure>


<p>It fires up a <strong>switchboard</strong>, which manages and wires systems, the <code>:wd/notify-cmp</code>, which calls <code>(SDNotify/sendWatchdog)</code> from the <strong><a href="https://github.com/faljse/SDNotify">SDNotify library</a></strong>, and a scheduler component, which emits <code>:wd/send</code> messages every <code>timeout</code> milliseconds. You can build much more complex applications with the <strong>systems-toolbox</strong>, e.g. <strong><a href="http://birdwatch.matthiasnehlsen.com">BirdWatch</a></strong>. The 14 lines above (plus comments and imports) however are about the minimum case when some scheduling is desired.</p>

<p>You can have a look at the mentioned examples if you&#8217;re interested in building systems with the systems-toolbox. In subsequent articles, I will introduce them in detail. For now, you can just use the library in your projects if you want to have your application monitored by systemd. It&#8217;s just a one-liner, as you can see for example in the <strong><a href="https://github.com/matthiasn/systems-toolbox/blob/master/examples/trailing-mouse-pointer/src/clj/example/core.clj#L41">trailing mouse pointer example</a></strong>:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'>  <span class="p">(</span><span class="nf">wd/start-watchdog!</span> <span class="mi">5000</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>This simple command calls systemd every 5 seconds, but only if the <code>NOTIFY_SOCKET</code> environment variable is set, which would only be the case if systemd had started the application.</p>

<p>Here&#8217;s the service configuration:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
</pre></td><td class='code'><pre><code class='text'><span class='line'>[Unit]
</span><span class='line'>Description=systems-toolbox websocket latency visualization example
</span><span class='line'>
</span><span class='line'>[Service]
</span><span class='line'>Type=simple
</span><span class='line'>User=bw
</span><span class='line'>Group=bw
</span><span class='line'>Environment=PORT=8010
</span><span class='line'>Environment=HOST=0.0.0.0
</span><span class='line'>WorkingDirectory=/home/bw/run
</span><span class='line'>ExecStart=/usr/bin/java -jar /home/bw/bin/trailing-mouse-pointer.jar
</span><span class='line'>WatchdogSec=20s
</span><span class='line'>Restart=on-failure
</span><span class='line'>
</span><span class='line'># Give a reasonable amount of time for the server to start up/shut down
</span><span class='line'>TimeoutSec=300
</span><span class='line'>
</span><span class='line'>[Install]
</span><span class='line'>WantedBy=multi-user.target
</span></code></pre></td></tr></table></div></figure>


<p>You can find all the service configurations for my server in my **<a href="https://github.com/matthiasn/conf">conf</a> project, together with some install scripts which allow me to set up a new server with little effort. I hope this helps you in your deployments. It certainly helps me with mine.</p>

<p>Would you like to know when there&#8217;s a new article? Subscribe to the <a href="http://eepurl.com/y0HWv" target="_blank"><strong>newsletter</strong></a> and I&#8217;ll let you know.</p>

<p>Cheers,
Matthias</p>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
<p>There, the problem was that silly Docker service that frequently hung, which, for whatever reason, required a <strong>REBOOT</strong> of the whole machine. As you can imagine, this was very annoying, as that, of course, meant ALL services would become unavailable until the machine was back up.<a href="#fnref:1" rev="footnote">&#8617;</a></p></li>
</ol>
</div>