Skip to content

Instrument calls to models

Bob Van Landuyt requested to merge bvl/track-concurrent-requests into main

Instrument calls to models

This adds an instrumentator that can be called around requests to different model engines. The first metric implemented here is a gauge counting the number of requests in flight.

For regular calls the caller just needs to wrap the inference inside a ModelRequestInstrumentator.watch call.

The instrumentor supports also streaming calls to models: in this case, the caller is reponsible for calling finish() after the response is completely consumed.

For gitlab-com/runbooks#143 (closed)

Also the groundwork for #371 (closed)

Edited by Bob Van Landuyt

Merge request reports