API Cloud Watch - do we need it?
Goal
The set of artifacts currently delivered in the end of each experiment lacks the following:
-
What was resources consumption (CPU, memory, IO, etc)?
-
What was the main bottleneck? (CPU / IO / network / etc)?
-
When a resource's consumption was higher, when was it lower? (historical data)
To mitigate this, we need to add graphs. Roadmap:
-
This metrics is the major one: CPU. But we need to get info about each vCPU/core – this is very important to detect bottlenecks -
find a way, how to collect it / discuss / decide -
implement
-
-
This metrics is the major one: memory. But not just "how much was consumed", we need more. -
define, how to get a much deeper look at memory ( /proc/meminfo
?vm.diry_**
? see Тюрин_pgconf19.pdf what else?) -
implement
-
-
This metrics is the major one: IO -
At bare minimum we need: read/write IOPS, read/write throughput. What we should use, iotop
? Discuss, decide. (Additionally:iostat
? queue size, %util?) -
implement
-
TODO / How to implement
For remote experiments on AWS, CloudWatch https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/Welcome.html might be helpful. Check their API - could we use it? Do we need it?
Acceptance criteria
We have analisys of pros and cons. As Nancy developers we can answer the question: do we need it in Nancy?