Skip to content

Improve kubernetes executor attach strategy command execution and handling by using a new read-logs command in the helper image

Georgi N. Georgiev requested to merge k8s_attach_improvements into master

What does this MR do?

We already have a few issues regarding the attach strategy which are either very hard and finicky to resolve with the current approach or straight-up impossible. This MR aims to improve the strategy so that these and future issues are either more easily resolvable or depend on us, rather than external systems such as docker and kubernetes which in turn depend on each other in unforeseen ways.

I went through a few different solutions and this seems the most iterative for the moment, involving a minimal set of changes that knock off all the known issues at once.

I'll list the issues and how this approach solves them:

#25346 (closed) - This seemingly simple bug has no good solution with our current approach of piping all commands' output directly to stdout. In order to resolve it we need to know whether the last line ended with a new line, but we can't do that since we have no history of the output. If we just add an empty line before printing the command exit status we just end up with a bunch of empty lines since most commands end with a new line.

This is now solved by piping the output of the commands to a log file as well as stdout. This allows us to check the last character before printing the command exit status. This approach has the added benefit that we no longer print the command exit status json to stdout, so it won't be visible in kubectl logs keeping the logs clean.

#24928 (closed) - This problem relies on the way the stars and the docker/kubernetes versions are aligned. I wasn't happy at all with the way it was solved in !1986 (closed). It was finicky and super complex, but that was the best we could do given the approach we had taken.

In this MR this is easily solved by not relying on timestamps from kubernetes/docker. We simply read the lines and dedupe them based on their index. We do that by mounting a shared logs volume and writing all the logs to output.log there. We then use the new read-logs command in gitlab-runner-helper to read the log lines and prepend them with their index. We run the helper commands only in the helper container, so now, instead of attaching to the logs of two containers, we attach to one log file. These changes greatly simplify the log processor, since it now only keeps a line index to know whether a log is processed. And attaching only to one stream will simplify it further (that's not yet done since it works like this for the WIP anyways).

Also, all lines are automatically split at 16kb in the runner helper so we don't need to worry about that in the executor.

There's also #25377 (closed) which just shows how unreliable timestamps are and even if we fix the log processor to parse this specific timestamp format, who is to tell when it will break again.

Are there points in the code the reviewer needs to double check?

This is a working wip in the sense that I have tested it with running jobs in order to avoid delving too deep in case we figure this isn't a direction we want to go to.

Tests and code style were not a priority.

Does this MR meet the acceptance criteria?

  • Documentation created/updated
  • Added tests for this feature/bug
  • In case of conflicts with master - branch was rebased

What are the relevant issue numbers?

Closes #25716 (closed), #25377 (closed), #25346 (closed), #24928 (closed)

Edited by Georgi N. Georgiev

Merge request reports