Skip to content

Add `crunch` tool to compute useful statistics

Matthias Käppler requested to merge crunch-tool into master

Given a sequence of numeric values, computes percentiles as well as min, max, and mean and outputs results in JSON.

I often find myself in use of this when I e.g. pidstat -t a process and collect data over time. So far what I would do is copy that data into a Google sheet to get min, max, percentiles etc.

With this tool, you can do this on the command line and get the result in JSON:

[15:20:25] work/team-tools::crunch-tool ✔ crunch/crunch.rb -h                                                
Crunches LF-separated numeric input and outputs statistics in JSON.

Usage:

  $crunch/crunch.rb < /path/to/data/file
  $crunch/crunch.rb (read from STDIN; end with CTRL+D)

where input data contains one number per line.

For instance, here is how it would process memory samples from a pidstat log:

# column 8 contains the process RSS
cat pidstat.log | awk '{print $8}' | crunch/crunch.rb | jq
{
  "data": [
    0,
    0,
    0,
    769656,
    769656,
    769656,
    769656,
    ...   
    797804,
    797804,
    797804,
    797804,
    797804,
    797804,
    828328
  ],
  "percentiles": {
    "p50": 795356,
    "p75": 797708,
    "p90": 797708,
    "p95": 797708,
    "p99": 797804
  },
  "min": 0,
  "max": 828328,
  "avg": 790032
}
Edited by 🤖 GitLab Bot 🤖

Merge request reports