Skip to content

GitLab sshd should include data transfer bytes in logs

GitLab sshd should include a written_bytes field in ssh logs, written when the SSH connection has completed.

Having this in place would make measurement of SSH egress much easier than at present.

This would be similar to Workhorse's written field (which is great but unfortunately does not include a unit of measure, lowering its self-documentability a little).

With the workhorse field, we're able to slice our logs into project, namespace etc. By adding this field we would be able to do similar analytics.

Summary Log Line

Ideally we have a single summary log line which is able to provide all the required details of a request in a single line. We do this on most of our services, including Rails, Workhorse, Sidekiq etc. This log line is written after the call is complete, contains meta information about the user, the request etc.

Generally, these messages have a simple message, such as msg: "access" and lots of additional fields which we can group by and aggregate on.

Currently, the duration_s field, signifying the duration of the connection is written in the connection: handleRequests: done log message: https://log.gprd.gitlab.net/goto/19ac9670-f314-11ed-a017-0d32180b1390. Requisitioning this log entry as the "access" entry probably makes sense.

We could to this by renaming it to msg: "access" to bring it in line with other services such as Workhorse.

Additionally, adding a few additionally fields into this log message would make it much more useable for an analytics point of view:

  1. meta.project: "gitlab-org/gitlab" writing out the project path allow us to slice queries by project. This is currently published for other log messages, but not connection: handleRequests: done as json.gl_project_path - moving it to meta.project would bring it inline with other services such as rails.
  2. meta.root_namespace: "gitlab-org" likewise, extracting the first element of the project as root_namespace would allow to analyse ssh egress per namespace.
  3. meta.username: "andrewn" username is published on other messages, but not the final completion message.

Further Reading

  1. Data Transfer Blueprint: gitlab!110417 (closed)
  2. Create a monthly summary of users that are exceeding our TOS may degrade availability or increase spend: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23608
  3. Investigate Cloudflare usage increase: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23538
  4. Data Transfer Limits for GitLab.com: https://gitlab.com/groups/gitlab-com/-/epics/1664

Implementation

cc @sean_carroll @joshlambert

Edited by Ash McKenzie