Skip to content

Include non-Ruby processes in Topology usage data

Matthias Käppler requested to merge 218546-more-topology-service-data into master

What does this MR do?

This is a follow-up to:

In that last MR we scoped service level data to just Ruby services since at the time it wasn't entirely clear yet how to get this information for non-Ruby services.

This MR does two things:

  1. It adds most, but not all, non-Ruby components customers can run to the topology usage ping. Similar data is exported (process_count and process_memory_rss) but it's not as complete because we have less data available for those services, and some services don't export any metrics at all (so with those we're flying blind.)
  2. It maps job names that were previously just symbolized and underscored to a set of well-defined service names. This will ensure that we can maintain a stable schema in the face of changing job names at the source. Unmapped services will be ignored, so that we do not accidentally include non-GitLab services once we extend this feature to external Prometheus servers, which could be scraping who knows what.

New services that should now be captured should include:

  • Gitaly
  • Redis
  • Postgres
  • Prometheus
  • node-exporter

Services that are not included because they do not currently export metrics to Prometheus, or because they are difficult to include:

  • Consul
  • PGBouncer (but support is on the way!)
  • NFS servers
  • Load balancers
  • Nginx
  • Grafana
  • alertmanager
  • logrotate
  • redis-exporter
  • postgres-exporter
  • gitlab-exporter
  • sshd

NOTE that as with the original MR, all of this will only apply to single-node installations for now, since we do not yet have the capabilities to locate an external Prometheus node. This will change at some point in the future though, so can never hurt to look at this through the "future looking glass" 🔭

Refs #218546 (closed)

Example

Pulled from the Usage Ping preview payload generated by registry.gitlab.com/gitlab-org/build/omnibus-gitlab-mirror/gitlab-ee:c0c45395c73eb5b595db389a7a0137cd0a043d24:

"topology": {
    "nodes": [
      {
        "node_memory_total_bytes": 33269903360,
        "node_cpus": 16,
        "node_services": [
          {
            "name": "web",
            "process_count": 16,
            "process_memory_pss": 195114368,
            "process_memory_rss": 780203776,
            "process_memory_uss": 155836416
          },
          {
            "name": "node-exporter",
            "process_count": 1,
            "process_memory_rss": 18259968
          },
          {
            "name": "postgres",
            "process_count": 1,
            "process_memory_rss": 18976768
          },
          {
            "name": "workhorse",
            "process_count": 1,
            "process_memory_rss": 36425728
          },
          {
            "name": "gitaly",
            "process_count": 1,
            "process_memory_rss": 37654528
          },
          {
            "name": "redis",
            "process_count": 1,
            "process_memory_rss": 19324928
          },
          {
            "name": "sidekiq",
            "process_count": 1,
            "process_memory_pss": 705674240,
            "process_memory_uss": 702689280,
            "process_memory_rss": 720261120
          }
        ]
      }
    ],
    "duration_s": 0.021399167999334168
  }

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Edited by 🤖 GitLab Bot 🤖

Merge request reports