Skip to content

Refactor metrics for instance context var usage

Cal Pratt requested to merge cpratt34/implicit-instance-metrics into master

Tag updates

Instance name is now fetched from the context var and applied as a tag "instanceName" if available. We will no longer apply the instance name as a prefix to the metric name. If the instance name is set and empty, it will be reported as "unnamed".

For service endpoints, the instance and the service definitions now post identical metrics payloads. The instance timer metrics have been deleted in favor of the service level metrics.

The metrics reporting at the server level has been updated to simplify the bots counts and scheduler metrics. Previously we had been applying some redundant tags to the values, but not reporting actually them due to the tags being disabled in the default configurations. These tags were all captured in the metric name, so they have been removed. _state_monitoring_worker is now much simpler.

Required config updates

Users that were relying on the separate metric names for instances will now need to enable tags and update dashboards. These configuration changes need to be paired with the server deployments by setting the tag format. e.g. StatsDTagFormat.INFLUX_STATSD / "influx-statsd"

Metrics utils

The DurationMetric, Counter, ExceptionCounter, and Distribution classes have all been replaced, along with their helper methods such as generator_method_exception_counter. Where possible, they have been replaced by simpler decorators or context managers which can handle both regular function and generator function signatures.

DurationMetric is replaced by the @timed(name) wrapper and a corresponding context manager with timer(name).

Counter and Distribution have been deleted. Just call publish_counter_metric and publish_distribution_metric functions now.

ExceptionCounter is replaced by the @error_count(name) wrapper, and the context manager has been removed as it was only used by test code.

Edited by Cal Pratt

Merge request reports