Add `crunch` tool to compute useful statistics
Given a sequence of numeric values, computes percentiles as well as min, max, and mean and outputs results in JSON.
I often find myself in use of this when I e.g. pidstat -t
a process and collect data over time. So far what I would do is copy that data into a Google sheet to get min
, max
, percentiles etc.
With this tool, you can do this on the command line and get the result in JSON:
[15:20:25] work/team-tools::crunch-tool ✔ crunch/crunch.rb -h
Crunches LF-separated numeric input and outputs statistics in JSON.
Usage:
$crunch/crunch.rb < /path/to/data/file
$crunch/crunch.rb (read from STDIN; end with CTRL+D)
where input data contains one number per line.
For instance, here is how it would process memory samples from a pidstat
log:
# column 8 contains the process RSS
cat pidstat.log | awk '{print $8}' | crunch/crunch.rb | jq
{
"data": [
0,
0,
0,
769656,
769656,
769656,
769656,
...
797804,
797804,
797804,
797804,
797804,
797804,
828328
],
"percentiles": {
"p50": 795356,
"p75": 797708,
"p90": 797708,
"p95": 797708,
"p99": 797804
},
"min": 0,
"max": 828328,
"avg": 790032
}
Edited by 🤖 GitLab Bot 🤖