Java 8 builds suddenly started to fail, with no change on our side

Hello guys.

Not really sure what this issue is, so I am going to explain everything.

We develop an application in Java 8, with spring boot 2.0.3.RELEASE and gitlab as the chosen git repository. We do our builds on our gitlab runners (we were running out of pipeline quotas).

The project is build using maven, and the tests are executed using the maven-surefire-plugin version 2.22.1

It all started in 30 October.

Out of nowhere (in 29 October everything was fine), the builds started to fail. All of them.

The detailed error output is :


[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:14 min
[INFO] Finished at: 2018-10-30T15:22:51Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on project core: There are test failures.
[ERROR] 
[ERROR] Please refer to /builds/prodigy-it-solutions/speedwell-core/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, [date].dumpstream and [date]-jvmRun[N].dumpstream.
[ERROR] The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /builds/prodigy-it-solutions/speedwell-core && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -jar /builds/prodigy-it-solutions/speedwell-core/target/surefire/surefirebooter323798655060650381.jar /builds/prodigy-it-solutions/speedwell-core/target/surefire 2018-10-30T15-22-50_129-jvmRun1 surefire7803824562161294355tmp surefire_07872595865329101781tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /builds/prodigy-it-solutions/speedwell-core && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -jar /builds/prodigy-it-solutions/speedwell-core/target/surefire/surefirebooter323798655060650381.jar /builds/prodigy-it-solutions/speedwell-core/target/surefire 2018-10-30T15-22-50_129-jvmRun1 surefire7803824562161294355tmp surefire_07872595865329101781tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:671)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:533)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:278)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:244)
[ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1149)
[ERROR] 	at 
{...}

This was a super surprise. We checked the builds locally, it passed on 5 different computer, so, to isolate the problem to gitlab, we restarted several builds that passed 2 weeks before 30 October, 1 month before 30 October and 2 months before 30 October. ( I was thinking the amount of tests were the issue, lol ).

All the builds failed with the same error.

I did a little bit of digging, and it appears that maven-surefire-plugin starts a new jvm in which it runs the tests (pretty cool, didn't knew that, even tough I am using this tech stack for the past 5 years).

It became pretty clear that the problem was that the docker images, where the builds are executed, suddenly got much less memory, or something like that. Maybe a release happened.

To make sure that our gitlab runners are not the issue, we disabled them and used the shared runners. Same problem.

The fix we currently have is to set the forkCount flag of maven-surefire-plugin to 0. Not sure if this creates new jvm or not, the docs from the maven plugin are not specific enough.

Trying to fix this without the "workaround" of setting forkCount on 0, I found this:

gitlab-runner#1582 (closed)

It's an issue which limits the docker runners memory and number of cores.

I changed the configs in our gitlab runners config files with 512 MB of memory (sounds reasonable).

The thing is, that did not solve the problem. More, the builds started to take almost 10 times as much time. Before this modification was done, the build would fail in about 2 minutes. After this change was done, the build started to take about 10 minutes before it failed. And a full build, which passed in about 5 minutes normally, took more than 30 minutes, then it timed out.

Here is the .gitlab-ci.yml file (the relevant part of it):


image: alpine

variables:

  # Configure maven
  MAVEN_OPTS: "-Dmaven.repo.local=.m2/repository"
  MAVEN_CLI_OPTS: "-s maven-settings/settings.xml --batch-mode"

  DOCKER_DRIVER: overlay2

  # The spring profile used to run integration tests
  SPRING_PROFILES_ACTIVE: gitlab-ci

  # Configure postgres service
{...}

stages:
- build
{...}

build-application:
  image: maven:3.5-jdk-8
  stage: build
  cache:
    key: ${CI_COMMIT_SHA}
    paths:
    - .m2/repository
  services:
  - postgres:10.4
  script:
  - mvn $MAVEN_CLI_OPTS clean install -U
  only:
  - branches
  - tags
  artifacts:
    paths:
    - target/*.jar

{...}

Any kind of input on this is more than welcome.

I find it disturbing that gitlab changes stuff which break builds. And it is a pretty generic project, nothing out of the ordinary.

If there are stuff that are missing and maybe relevant to the situation, please let me know at:

catalin.stan@prodigy-it-solutions.com

Edited Nov 09, 2018 by Catalin Stan