feat: instrument all model requests
feat: instrument all model requests
This reimplements some of the metrics that are currently in the
TextGenModelInstrumentator
from instrumentators/base.py
but
without coupling them to code-suggestions. We're supposed to use this
instrumentator for all calls to models, external or otherwise to get
metrics for them.
This adds 2 new metrics for inferences:
-
model_inferences_total
: A counter that gets incremented for every inference. -
inference_request_duration_seconds
: A histogram observing the duration of a inference.
Both of these have 2 new labels:
-
error
:yes
orno
. This will allow us to measure error rates, and exclude failing inferences from the apdex for services. -
streaming:
yesor
no`. This will allow us to measure the apdex for streaming inferences differently.
This also implements the new error callback for streaming implementations (for now only Anthropic).
For #441 (closed)