Skip to content

chore(metrics): Improve code structuring for grouping metrics by attributes

Ankit Bhatnagar requested to merge abhatnagar/subquery-counters-aggregation into main

Related #2561

In the current implementation of grouping/aggregating metrics by attributes, aggregation is performed over metric-fingerprint, then grouped by each requested attribute. This is incorrect since the aggregation should also be performed over only the attributes necessary and not over the entire label-set - essentially the fingerprint.

This MR improves the logic around how this aggregation is performed while making it more maintainable/readable. Logically, metrics-search for counters is performed as the following:

  • Step 1: Compute all fingerprints that match the given query.
  • Step 2: Query all metrics that correspond to the fingerprints found in 1. While doing this, also apply aggregate function merges when querying rolled-up tables.
  • Step 3: Compute aggregate over all groupby attributes as necessary.
  • Step 4: Compute query results to be passed back to the client by constructing the needed time-series.

This structures query-construction much better and allows for later improvements, e.g. accounting for aggregation temporality in the case of sums. While at it:

  • Corresponding logic within data_generator.go is also updated to aggregate over filtered attributes.
  • Minor cleanup in FilteredAttributes being represented as a map[string]string instead of prometheus.LabelSet.
  • Couple of additional tests to ensure aggregation over filtered attributes is correct.
  • Adding the ability to format metric-values in our API response - this is particularly important in limiting our API response sizes while ensuring tests have repeatability when comparing API responses.
Tested on a dev-cluster with:
➜  ~ curl --silent "http://localhost:8082/v3/query/51792562/metrics/search?mname=system.cpu.time&mtype=Sum&groupby_fn=avg&groupby_attrs=cpu,state&period=30d" | jq .
{
  "start_ts": 1712577957945111000,
  "end_ts": 1715169957945111000,
  "results": [
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "1",
        "state": "idle"
      },
      "values": [
        [
          "1714435200000000000",
          "990532.192143"
        ],
        [
          "1714521600000000000",
          "1030103.116936"
        ],
        [
          "1714694400000000000",
          "1163154.940288"
        ],
        [
          "1714780800000000000",
          "1199597.649915"
        ],
        [
          "1714867200000000000",
          "1239257.117935"
        ],
        [
          "1714953600000000000",
          "1302186.051294"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "1",
        "state": "irq"
      },
      "values": [
        [
          "1714435200000000000",
          "0.000000"
        ],
        [
          "1714521600000000000",
          "0.000000"
        ],
        [
          "1714694400000000000",
          "0.000000"
        ],
        [
          "1714780800000000000",
          "0.000000"
        ],
        [
          "1714867200000000000",
          "0.000000"
        ],
        [
          "1714953600000000000",
          "0.000000"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "1",
        "state": "system"
      },
      "values": [
        [
          "1714435200000000000",
          "116027.795714"
        ],
        [
          "1714521600000000000",
          "121157.215097"
        ],
        [
          "1714694400000000000",
          "124653.032302"
        ],
        [
          "1714780800000000000",
          "125323.892293"
        ],
        [
          "1714867200000000000",
          "125904.157648"
        ],
        [
          "1714953600000000000",
          "126192.603647"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "1",
        "state": "user"
      },
      "values": [
        [
          "1714435200000000000",
          "272466.274048"
        ],
        [
          "1714521600000000000",
          "283061.857354"
        ],
        [
          "1714694400000000000",
          "289284.988962"
        ],
        [
          "1714780800000000000",
          "290158.725245"
        ],
        [
          "1714867200000000000",
          "292895.895380"
        ],
        [
          "1714953600000000000",
          "294876.247765"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "2",
        "state": "idle"
      },
      "values": [
        [
          "1714435200000000000",
          "1000618.687619"
        ],
        [
          "1714521600000000000",
          "1040222.510975"
        ],
        [
          "1714694400000000000",
          "1173383.716999"
        ],
        [
          "1714780800000000000",
          "1209906.328439"
        ],
        [
          "1714867200000000000",
          "1250756.906429"
        ],
        [
          "1714953600000000000",
          "1314480.398235"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "2",
        "state": "irq"
      },
      "values": [
        [
          "1714435200000000000",
          "0.000000"
        ],
        [
          "1714521600000000000",
          "0.000000"
        ],
        [
          "1714694400000000000",
          "0.000000"
        ],
        [
          "1714780800000000000",
          "0.000000"
        ],
        [
          "1714867200000000000",
          "0.000000"
        ],
        [
          "1714953600000000000",
          "0.000000"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "2",
        "state": "system"
      },
      "values": [
        [
          "1714435200000000000",
          "116964.019048"
        ],
        [
          "1714521600000000000",
          "122108.409471"
        ],
        [
          "1714694400000000000",
          "125641.003309"
        ],
        [
          "1714780800000000000",
          "126325.609861"
        ],
        [
          "1714867200000000000",
          "126927.249550"
        ],
        [
          "1714953600000000000",
          "127229.978471"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "2",
        "state": "user"
      },
      "values": [
        [
          "1714435200000000000",
          "263680.845714"
        ],
        [
          "1714521600000000000",
          "274405.071281"
        ],
        [
          "1714694400000000000",
          "280765.345714"
        ],
        [
          "1714780800000000000",
          "281643.103527"
        ],
        [
          "1714867200000000000",
          "283248.966436"
        ],
        [
          "1714953600000000000",
          "284456.905294"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "3",
        "state": "idle"
      },
      "values": [
        [
          "1714435200000000000",
          "993418.003571"
        ],
        [
          "1714521600000000000",
          "1033027.927131"
        ],
        [
          "1714694400000000000",
          "1166292.409558"
        ],
        [
          "1714780800000000000",
          "1202603.502426"
        ],
        [
          "1714867200000000000",
          "1232365.674643"
        ],
        [
          "1714953600000000000",
          "1288791.481765"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "3",
        "state": "irq"
      },
      "values": [
        [
          "1714435200000000000",
          "0.000000"
        ],
        [
          "1714521600000000000",
          "0.000000"
        ],
        [
          "1714694400000000000",
          "0.000000"
        ],
        [
          "1714780800000000000",
          "0.000000"
        ],
        [
          "1714867200000000000",
          "0.000000"
        ],
        [
          "1714953600000000000",
          "0.000000"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "3",
        "state": "system"
      },
      "values": [
        [
          "1714435200000000000",
          "113511.133810"
        ],
        [
          "1714521600000000000",
          "118554.819443"
        ],
        [
          "1714694400000000000",
          "122064.441274"
        ],
        [
          "1714780800000000000",
          "122739.516534"
        ],
        [
          "1714867200000000000",
          "123249.509099"
        ],
        [
          "1714953600000000000",
          "123489.204000"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "3",
        "state": "user"
      },
      "values": [
        [
          "1714435200000000000",
          "275660.162381"
        ],
        [
          "1714521600000000000",
          "286515.758440"
        ],
        [
          "1714694400000000000",
          "292826.287040"
        ],
        [
          "1714780800000000000",
          "293942.358584"
        ],
        [
          "1714867200000000000",
          "306820.402632"
        ],
        [
          "1714953600000000000",
          "315457.044824"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "4",
        "state": "idle"
      },
      "values": [
        [
          "1714435200000000000",
          "995521.239524"
        ],
        [
          "1714521600000000000",
          "1035181.513900"
        ],
        [
          "1714694400000000000",
          "1168539.410555"
        ],
        [
          "1714780800000000000",
          "1205093.971337"
        ],
        [
          "1714867200000000000",
          "1243198.503502"
        ],
        [
          "1714953600000000000",
          "1303736.556000"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "4",
        "state": "irq"
      },
      "values": [
        [
          "1714435200000000000",
          "0.000000"
        ],
        [
          "1714521600000000000",
          "0.000000"
        ],
        [
          "1714694400000000000",
          "0.000000"
        ],
        [
          "1714780800000000000",
          "0.000000"
        ],
        [
          "1714867200000000000",
          "0.000000"
        ],
        [
          "1714953600000000000",
          "0.000000"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "4",
        "state": "system"
      },
      "values": [
        [
          "1714435200000000000",
          "107385.601429"
        ],
        [
          "1714521600000000000",
          "112215.164457"
        ],
        [
          "1714694400000000000",
          "115576.810092"
        ],
        [
          "1714780800000000000",
          "116225.453418"
        ],
        [
          "1714867200000000000",
          "116777.158346"
        ],
        [
          "1714953600000000000",
          "117039.086588"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "4",
        "state": "user"
      },
      "values": [
        [
          "1714435200000000000",
          "279377.901667"
        ],
        [
          "1714521600000000000",
          "290398.776017"
        ],
        [
          "1714694400000000000",
          "296756.589723"
        ],
        [
          "1714780800000000000",
          "297646.125953"
        ],
        [
          "1714867200000000000",
          "302078.691483"
        ],
        [
          "1714953600000000000",
          "306549.152941"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "5",
        "state": "idle"
      },
      "values": [
        [
          "1714435200000000000",
          "999575.280952"
        ],
        [
          "1714521600000000000",
          "1039418.174123"
        ],
        [
          "1714694400000000000",
          "1172755.136608"
        ],
        [
          "1714780800000000000",
          "1209071.479776"
        ],
        [
          "1714867200000000000",
          "1244411.397399"
        ],
        [
          "1714953600000000000",
          "1304949.757647"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "5",
        "state": "irq"
      },
      "values": [
        [
          "1714435200000000000",
          "0.000000"
        ],
        [
          "1714521600000000000",
          "0.000000"
        ],
        [
          "1714694400000000000",
          "0.000000"
        ],
        [
          "1714780800000000000",
          "0.000000"
        ],
        [
          "1714867200000000000",
          "0.000000"
        ],
        [
          "1714953600000000000",
          "0.000000"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "5",
        "state": "system"
      },
      "values": [
        [
          "1714435200000000000",
          "106769.615238"
        ],
        [
          "1714521600000000000",
          "111568.569248"
        ],
        [
          "1714694400000000000",
          "114901.530935"
        ],
        [
          "1714780800000000000",
          "115531.172541"
        ],
        [
          "1714867200000000000",
          "116053.761188"
        ],
        [
          "1714953600000000000",
          "116319.758118"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "cpu": "5",
        "state": "user"
      },
      "values": [
        [
          "1714435200000000000",
          "275803.360238"
        ],
        [
          "1714521600000000000",
          "286668.736128"
        ],
        [
          "1714694400000000000",
          "293068.482395"
        ],
        [
          "1714780800000000000",
          "294214.924392"
        ],
        [
          "1714867200000000000",
          "301463.062120"
        ],
        [
          "1714953600000000000",
          "305929.503059"
        ]
      ]
    }
  ]
}
➜  ~ curl --silent "http://localhost:8082/v3/query/51792562/metrics/search?mname=system.cpu.time&mtype=Sum&groupby_fn=avg&groupby_attrs=state&period=30d" | jq .
{
  "start_ts": 1712577962350400000,
  "end_ts": 1715169962350400000,
  "results": [
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "state": "idle"
      },
      "values": [
        [
          "1714435200000000000",
          "995933.080762"
        ],
        [
          "1714521600000000000",
          "1035590.648613"
        ],
        [
          "1714694400000000000",
          "1168825.122802"
        ],
        [
          "1714780800000000000",
          "1205254.586379"
        ],
        [
          "1714867200000000000",
          "1241997.919981"
        ],
        [
          "1714953600000000000",
          "1302828.848988"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "state": "irq"
      },
      "values": [
        [
          "1714435200000000000",
          "0.000000"
        ],
        [
          "1714521600000000000",
          "0.000000"
        ],
        [
          "1714694400000000000",
          "0.000000"
        ],
        [
          "1714780800000000000",
          "0.000000"
        ],
        [
          "1714867200000000000",
          "0.000000"
        ],
        [
          "1714953600000000000",
          "0.000000"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "state": "system"
      },
      "values": [
        [
          "1714435200000000000",
          "112131.633048"
        ],
        [
          "1714521600000000000",
          "117120.835543"
        ],
        [
          "1714694400000000000",
          "120567.363583"
        ],
        [
          "1714780800000000000",
          "121229.128929"
        ],
        [
          "1714867200000000000",
          "121782.367166"
        ],
        [
          "1714953600000000000",
          "122054.126165"
        ]
      ]
    },
    {
      "name": "system.cpu.time",
      "description": "System CPU time",
      "unit": "seconds",
      "type": "Sum",
      "attributes": {
        "state": "user"
      },
      "values": [
        [
          "1714435200000000000",
          "273397.708810"
        ],
        [
          "1714521600000000000",
          "284210.039844"
        ],
        [
          "1714694400000000000",
          "290540.338767"
        ],
        [
          "1714780800000000000",
          "291521.047540"
        ],
        [
          "1714867200000000000",
          "297301.403610"
        ],
        [
          "1714953600000000000",
          "301453.770776"
        ]
      ]
    }
  ]
}
Edited by Ankit Bhatnagar

Merge request reports