Skip to content

Implement SLIs and saturation points for Zoekt backend service

Background

The zoekt service does currently not have any SLIs defined. The readiness review refers to The Global Search stage group dashboard.

We do measure SLIs on the client side. And we have the Zoekt Info dashboard which includes server-side metrics, albeit not in standard SLI form.

Problem

The SLI framework gives us various benefits that we are missing out on:

  • Standard dashboards that have the same feel, making diagnosis easier.
  • Alerting (measured on the server).
  • Capacity planning based on saturation metrics.

Proposal

Bring the metrics from the Zoekt Info dashboard into SLIs (apdex, request rate, error rate) and saturation points (disk_maximum_capacity, cpu, etc) on the zoekt service.

Remove the old dashboard once the move is complete.

(*) denotes a metric we may want to add

(existing) Zoekt info (new) zoekt: Overview

Search Duration 98th Percentile

Service Level Indicators

zoekt-webserver search duration

Search Rate 5m

Service Level Indicators

zoekt-webserver search rate

Error Rate 5m

Service Level Indicators

zoekt-webserver error rate

Service Level Indicators

(NEW) Global Search SLI apdex + error for search_type=zoekt

CPU Usage

Saturation Details

CPU Throttling

Saturation Details

Memory Map Usage

Saturation Details

Saturation Details

Container memory usage (*)

Saturation Details

Container process reside memory bytes (*)

Saturation Details

go memstats heap in use (bytes) (*)

Persistent Volume Disk Utilization

Saturation Details

I/O Reads

Saturation Details

I/O Writes

Saturation Details

Task processing queue length

Saturation Details

Saturation Details

(NEW) Memory Usage

Edited by Terri Chu