Skip to content

Telemetry: Fix `app_server_type` attribute

Matthias Käppler requested to merge 219114-fix-app-server-type into master

What does this MR do?

For Usage Ping we have been collecting the app_server_type for a while now, but that data is always wrong, because it is evaluated based on the client runtime, which will always be a sidekiq worker, not a Rails app server.

In order to reliably know which app server (Puma or Unicorn) is running on which node in any given Omnibus installation, we need to push this logic out of the application runtime and down to Prometheus. Fortunately, we sort of already have this data: since both Unicorn and Puma export metrics to Prometheus, the mere presence of these will tell us what is running.

The metric in question is added as a recording rule in this MR: omnibus-gitlab!4374 (merged)

Here, we are making the client side changes that involve an additional query for the new metric being recorded; it carries a server label indicating which app server (puma or unicorn) is running. Moreover, via the instance and job labels we can then associate this to a node and submit it alongside the existing data in the topology Usage Ping.

Example

I pulled this from an Omnibus container:

"topology": {
  "application_requests_per_hour": 266,
  "nodes": [
    {
      "node_memory_total_bytes": 33269903360,
      "node_cpus": 16,
      "node_services": [
        {
          "name": "web",
          "process_count": 16,
          "process_memory_rss": 732653824,
          "process_memory_uss": 110505792,
          "process_memory_pss": 148698496,
          "server": "puma"
        },
        {
          "name": "sidekiq",
          "process_count": 3,
          "process_memory_rss": 734683591,
          "process_memory_uss": 716128711,
          "process_memory_pss": 718348174
        },
        {
          "name": "node-exporter",
          "process_count": 1,
          "process_memory_rss": 15460352
        },
        {
          "name": "redis",
          "process_count": 1,
          "process_memory_rss": 13308928
        },
        {
          "name": "postgres",
          "process_count": 1,
          "process_memory_rss": 16097280
        },
        {
          "name": "workhorse",
          "process_count": 1,
          "process_memory_rss": 31762432
        },
        {
          "name": "gitaly",
          "process_count": 1,
          "process_memory_rss": 32832512
        }
      ]
    }
  ],
  "duration_s": 0.032179404002818046,
  "failures": [

  ]
}

The new element here is the "server": "puma" entry for web. Every web node will have this entry now.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Edited by Matthias Käppler

Merge request reports