Instrument calls to models
Instrument calls to models
This adds an instrumentator that can be called around requests to different model engines. The first metric implemented here is a gauge counting the number of requests in flight.
For regular calls the caller just needs to wrap the inference inside a
ModelRequestInstrumentator.watch
call.
The instrumentor supports also streaming calls to models: in this
case, the caller is reponsible for calling finish()
after the
response is completely consumed.
For gitlab-com/runbooks#143 (closed)
Also the groundwork for #371 (closed)
Edited by Bob Van Landuyt