Add metrics about *.db files in GME
Context
We are seeing large spikes in memory use due to what appears to be unusually large sample sets being ingested.
More context on this issue: gitlab-metrics-exporter#25 (closed)
Proposal
To understand under which circumstances the problem appears and also to reproduce it locally, we need more data.
We want to add additional metrics in our GME, to understand the metrics file sizes (*.db) we process.
Having them, we could correlate this information with GME memory spikes and see if there is a clear connection.
And also to understand how extreme the cases could be in prod.
Implementation details
Discussion: Slack
Idea: Create an entirely new mmap-specific probe: mmap_info or mmap_metadata or something.
All this would do is stat the file system using the same config we read for the mmap probe.
-: It wouldn't be able to introspect data; if we collect stats directly in mmap, we could get additional insights such as the actual number of samples processed, which could be useful too
+: The cleanest and simplest way to do it. Best for iteration.
It could include the following stats:
- a number of *.db` files on disk (we can add labels that even discern by type, aggregation, etc...)
- the size of the file (again labeled by the actual path/filename)
These could be a simple gauge ^
Other
Potential follow-up: add similar to Ruby Exporter, if we will need to compare