Skip to content

Add error information to GraphQL logs and metrics

Bob Van Landuyt requested to merge bvl/track-graphql-errors into master

chore: refactor graphql instrumentation tracer

The GraphQL ruby tracers don't work as we expected them to. An #execute_query trace does not necessarily execute the query. It prepares the query for execution.

The execution itself happens in the context of a multiplex, even if there is only one query. This ensures that data that can be shared between the queries, avoiding hitting the resources multiple times.

This changes the logging and instrumentation of those queries to share the duration of the total execution. This ensures that we have relevant duration information in the logs.

In practice, I don't think our frontends multiplex queries.

This is also the point at which the query is fully execution, which means we can inspect the result to gather any errors that happened during the execution of the query and expose that information in metrics and logs.

This merges the 3 tracers that are supposed to provide information into a single one that collects all of the information. This ensures that we're always comparing apples to apples when we talk about durations: the duration in the logs is also the duration we've used for the apdex metric.

For #345263 (closed)

feat: take GQL-query success into account for instrumentation

This includes the GraphQL error messages in the logs if there were any.

It also prevents recording an apdex for failed queries, as those are not very valuable to calculate performance.

For #345263 (closed)

feat: add an error rate SLI for graphql queries

This adds a counter for all GraphQL queries. It increments the ops rate for all queries exectuted and it increments the error rate in case there was an exception or if the result contained errors.

This means that invalid queries sent to us will also result in an error. For this reason, we need to make sure that we only include queries that we know of in our SLIs. We can distinguish these in metrics using the endpoint_id label. We'll only populate that label for queries from our own application. All other queries will have graphql:unknown as the endpoint_id.

For #345263 (closed)

Edited by Bob Van Landuyt

Merge request reports