Slackline V2
Slackline is a tool that the infrastructure team use for generating useful alertmanager notifications in Slack. The project can be found here: https://gitlab.com/gitlab-com/gl-infra/slackline.
<img src="/uploads/96f3c45d19673cfd32c9d29535974f26/image.png" width="350"/>
History
=============
The initial version of slackline was developed when:
1. GitLab did not run a production GKE cluster
1. Google Cloud Functions was beta and only supported the nodejs runtime
These two constraints pushed the implementation towards a nodejs serverless cloud function running in Google Cloud Functions, written in Javascript/nodejs.
Problems with this approach
============================
Javascript is not a widely used language in the infrastructure team. SREs are not familiar with the stack.
Proposal
=============
For the next iteration of this problem, I propose the following changes:
1. Replace the nodejs/GCF implementation with a standard Go service
1. Replace GCF as the runtime with the existing kubernetes cluster we run in production (ops)
1. Switch the Elasticsearch persistence (chosen for ease of access over anything else) with Consul KV Store for atomic updates (for deduplication)
This approach will bring several benefits:
1. Improve the ability of SREs within the team to contribute towards the project
1. Bring the project in line with other projects within the infrastructure team
1. Improve reliability, particularly of duplicates
epic