D

devops-harbinger

Fledgling alerting system for servers + apps. No DSLs or configs, just JavaScript code.

Name Last Update
example Loading commit data...
lib Loading commit data...
test Loading commit data...
.gitlab-ci.yml Loading commit data...
.travis.yml Loading commit data...
CONTRIBUTING.md Loading commit data...
LICENSE.txt Loading commit data...
README.markdown Loading commit data...
index.js Loading commit data...
package.json Loading commit data...

I AM HARBINGER

If you need a simple alerting system for your servers and/or apps, this node.js project might be of use to you. I wanted a simple alerting system for my linode server that didn't require exotic databases or other heavy software (Graphite, Grafana, ELK, etc). If I've overlooked a piece of lightweight software that might fit the bill, please let me know! Otherwise...

This application has some batteries included. Howevere, it acknowledges that everyone's alerting needs are different and feels the best way to be flexible is to avoid unnecessary assumptions or layers of complexity. Instead, it invites DevOps to write code to solve their problems. This project is written in JavaScript (node.js) and there is no DSL or config syntax to learn.

In general, here's what you need to do:

  1. Create a StatsD listener (see src/example-statsd.js)
  2. Configure your main StatsD daemon to forward messages to harbinger
  3. Write JavaScript to analyze the incoming messages using regex or other comparisons
  4. Use the throttle module and specify your desired alerting/output method (console, email, Slack, PagerDuty)

Comes with the following batteries:

  • File watcher (so you can tail log files)
  • Syslog message parser
  • StatsD message parser
  • Ability to throttle messages in-memory and using redis (for when you need to scale this horizontally)
  • In dev: alert if an accumulated value reaches M over N number of seconds
  • Other helpers

If you have any other ideas, please let me know: alan.szlosek@gmail.com

CONTRIBUTING

Contributions greatly appreciated! See CONTRIBUTING.md for more information.

USAGE

See src/example-statsd.js. The example showcases:

  • Listening for statsd metrics
  • Throttling repeat messages
  • Alerting via email

OVERVIEW

Recommended setup

  1. Get messages into harbinger somehow. We have examples for StatsD, Syslog and file tailing.
  2. Use lib/message.js to create message object with payload.
  3. Write delivery logic to pick which messages you want to alert on, and how.
  4. Use lib modules to group, throttle, etc. Or write your own custom threshold calculations.
  5. Use output modules to send message to desired locations.

Message object example:

{
    // Payload can be whatever you want it to be,
    // whatever makes sense for the type of data you're processing.
    payload: {
        metric: 'app.module.errors.count',
        value: 761,
        type: 'counter'
    },
    // This is more like "service": statsd, syslog, etc
    source: 'statsd',
    // StatsD server that forwarded the metric to our harbinger app
    // Can usually glean this from the TCP/UDP connection info
    remoteAddress: '192.168.1.1',
    // We may want to track localAddress too, if we're listening on more than 1 interface
    localAddress: localAddress,
    // We may have multiple listeners for the same service (statsd) on different ports
    localPort: localPort,
    // Unix timestamp in milliseconds when harbinger received the message
    receivedMilliseconds: Date.now()
}

TESTS

Run npm test to run unit tests. A local Redis server is required for the throttling tests.

IN PROGRESS

  • Ditch Throttle for a Digest module. If dev asked for five minute digests, they would trigger at 0:00, 0:05, etc
  • Rework Timeseries module to use Redis scripts (in lua) to calculcate statistics
  • Once that's done, create an example that alerts when error rate reaches a threshold. Error rate is calculated using request count and error count.

TODO

  • More decision-making modules (ie. alert if std deviation changes by M over N seconds). Need to evaluate features of other systems
  • Refine terminology
  • More work on scaling horizontally. Particularly tricky is the timeouts for throttling functionality. If we have multiple instances of harbinger processing the same streams of input (in load-balanced fashion), only 1 node should be setting and responding to timeouts.

LICENSE

MIT License. See LICENSE.txt for more information.