Container Scanning CI job log output is occasionally truncated

The table output in the following job log is truncated, while another job triggered with no additional code changes shortly after, is not truncated.

Relevant slack discussion: https://gitlab.slack.com/archives/C0SFP840G/p1569428697178200

According to the above slack conversation with @steveazz:

Looking at https://log.gitlab.net/goto/bf63754b1168a0b4dd976f777cdc73bb all I see is 200 error codes

I mean https://log.gitlab.net/goto/09763e6b5ff1f2e8bf2a64c1a05c719a 202 is the status code we want

Looking at the job sections they seem to be also correct https://gitlab.com/gitlab-org/security-products/tests/container-scanning/-/jobs/303887197/raw

Also looking at that raw trace we see the following:

Unapproved Medium CVE-2018-0732 openssl 1.1.0f-3+deb9u2   During key agreement in a TLS handshake using a DH(E) based
                                                          ciphersuite a malicious server can send a very large prime
                                                          value to the client. This will cause the client to spend an
                                                          unreasonably long period of time generating a key for this
section_end:1569425687:build_script

Notice how the section build_script is ended correctly.

That said I think this is some issue with the container scanning because:

  1. There are no errors with sending traces to GitLab
  2. The trace sections, specifically the build_script starts and ends correctly.

So it looks like the issue is within the container scanning tool itself. I have a feeling this is caused by the way we set the ENTRYPOINT of our Dockerfile to execute /container-scanner/start.sh. This causes our shell script to run as PID 1, and there can often be issues related to sending a TERM signal to this process as described in Introducing dumb-init, an init system for Docker containers. It seems that the convert process is being ended too early without having a chance to properly flush all output.

What's interesting about this issue is that it seems to always generate the gl-container-scanning-report.json file as expected, so it looks like the convert process does complete successfully, it's just that sometimes the table output in the CI job log is truncated.

The first approach at solving this would be to have our ENTRYPOINT execute the supervisord process, and have supervisord be completely responsible for launching the clair server process as well as the start.sh script. This would satisfy the dumb-init process requirement described by Introducing dumb-init, an init system for Docker containers and I think might solve the issue.

cc @gonzoyumo @NicoleSchwartz

Assignee Loading
Time tracking Loading