Add jobAnalytics GraphQL API to fetch data on our jobs
Description
In #453956 (closed) we describe a adding a new panel with job metrics.
The panel should be fed to with information from our ClickHouse data source via our GraphQL API.
Motivation
Our ClickHouse analytics data source can provide powerful
We can use the dimension and metric concept to allow API users to build different queries that can be presented in different visualizations.
Technical proposal
Our jobs analytics would be based on the data in the ci_finished_builds ClickHouse table, our API should be able to query this table's dimensions and metrics like this:
SELECT
-- dimensions
name as jobName,
stage,
-- metrics
count(*) as count,
avg(duration) as avgDuration,
quantile(0.95)(duration) as p95Duration,
countIf(status = 'failed') as failedCount,
countIf(status = 'success') as successCount
FROM ci_finished_builds
WHERE name like 'job%'
GROUP BY name, stage
ORDER BY avgDuration desc
LIMIT 10;
We can create an schema that gets used to generate a query like this to fetch the job metrics:
const input = {
dimensions: [
{ column: "name", as: "jobName" },
{ column: "stage" },
],
metrics: [
{ function: "count" },
{ function: "avg", column: "duration", as: "medianDuration" },
{ function: "quantile(0.95)", column: "duration", as: "p95Duration" },
{
function: "countIf",
condition: [
{ column: "status", operator: "=", value: "failed" },
],
as: "failedCount"
},
{
function: "countIf",
condition: [
{ column: "status", operator: "=", value: "success" },
],
as: "successCount"
}
],
filter: [
{ column: "jobName", operator: "LIKE", value: "job%" }
],
orderBy: [
[{
column: 'medianDuration',
desc: true
}, {
column: 'jobName',
desc: true
}]
],
limit: 10,
offset: 0
};
GraphQL schema (draft)
Our schema would require users to define at least one metric, some dimensions and an optional number of filters. The schema for each looks like this:
Analytics Query
input AnalyticsQueryInput {
dimensions: [AnalyticsDimensionInput]
metrics: [AnalyticsMetricInput!] # at least one metric is needed, like `count`
filter: [AnalyticsFilterInput]
orderBy: [AnalyticsOrderByInput]
limit: Int
offset: Int
}
query getJobAnalytics(
$fullPath: ID!
$fromTime: Time!
$toTime: Time!
$source: CiPipelineSources
$branch: String
$query: AnalyticsQuery # here is where the magic happens!
) {
project(fullPath: $fullPath) {
id
pipelineAnalytics(fromTime: $fromTime, toTime: $toTime, source: $source, ref: $branch, $query: query) {
values # ordered array representation of the rows
errors
}
}
}
Dimension input
input AnalyticsDimensionInput {
column: String! # e.g. the job's "name", "stage"
as: String # optional
}
Example of a dimension:
{ column: "name" } // maps to SELECT "name" ... GROUP BY "name"
{ column: "name", as: "jobName" } // maps to SELECT "name" as "jobName" ... GROUP BY "jobName"
Metric input
input AnalyticsMetricInput {
function: String! # an aggregate function e.g. `avg`, `quantile(0.5)`, `countIf`
column: String # e.g. duration, none represents `*`
filter: FiltersInput # to pass to combinators e.g. { column: `status`, operator: `=`, value: 'failed' }
as: String # avgDuration
}
Example of a few metrics:
{ function: `count` as: `count` }
// maps to: SELECT ... count(*) as "count" FROM ...
{ function: `avg`, column: `duration`, as: `avg` }
// maps to: SELECT ... avg(duration) as "avg" FROM ...
{ function: `countIf`, filter: { column: `status`, operator: `=`, value: 'failed' }, as: `failedCount` }
// maps to: SELECT ... countIf(status = 'failed') as "failedCount" FROM ...
Filters input
input AnalyticsFiltersInput {
column: String, # e.g. name
operator: String, # e.g. =, LIKE
value: [String, Number], # comparator
}
Example of a filter:
{ column: "jobName", operator: "=", value: "job" }
// maps to: SELECT ... WHERE jobName = "job"
{ column: "jobName", operator: "LIKE", value: "job%" }
// maps to: SELECT ... WHERE jobName = "job%"
Order By
input AnalyticsOrderByInput {
column: String, # e.g. name (the job name)
asc: Boolean, # true by default, send false for desc
}
Example of an orderBy (we can pass multiple columns).
{ column: "avgDuration", desc: true }
Out of scope
-
Time series: #453956 (closed) does not yet consider which act as a special kind of grouping over a time dimension.
- We expect this API can be extended to include time series with different time periods.
- Other operators/filters We are only considering operators needed to implement the query
