Jobs are randomly hanging
Summary
From time to time our test jobs are hanging and failing with the message
ERROR: Job failed: execution took longer than 1h0m0s seconds
Steps to reproduce
our gitlab-ci.yml
test_app: stage: test image: registry.gitlab.com/drink-it/shop/docker:latest services: - docker:dind variables: DOCKER_DRIVER: overlay2 tags: - docker script: - docker --version - docker-compose --version - docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN $CI_REGISTRY - docker-compose up -d webserver - cp app/etc/local.xml.docker app/etc/local.xml - docker-compose exec -T webserver bash -c 'cd /var/www/htdocs/dev && composer install --no-interaction' - docker-compose exec -T webserver bash -c './shell/docker/check-mysql.sh' - docker-compose exec -T webserver bash -c 'set-base-url -c http://drink.loc/' - docker-compose exec -T webserver bash -c 'phantomjs --webdriver=4444 --ignore-ssl-errors=yes >/dev/null 2>&1 &' - docker-compose exec -T webserver bash -c './shell/docker/check-phantomjs.sh' - docker-compose exec -T webserver bash -c 'cd /var/www/htdocs/dev && ./vendor/bin/codecept run --steps' artifacts: when: on_failure paths: - $CI_PROJECT_DIR/dev/tests/_output/*.png
our docker-compose.yml
version: "3.1" services: memcached: image: memcached:alpine container_name: drink-memcached redis: image: redis:alpine container_name: drink-redis mysql: image: mysql:5.6 container_name: drink-mysql volumes: - ./dev/docker/.data/db:/var/lib/mysql - ./dev/build/db/dump.sql.gz:/docker-entrypoint-initdb.d/dump.sql.gz webserver: image: registry.gitlab.com/drink-it/shop:latest container_name: drink-webserver depends_on: - "mysql" - "redis" - "memcached" extra_hosts: - "drink.loc:127.0.0.1" - "business.drink.loc:127.0.0.1" volumes: - .:/var/www/htdocs
Actual behaviour
Sometimes build hangs and fails due to the timeout
in the beginning we thought that the problem is in the tests, but running same steps on local machine always gives successful result. moreover, when it fails on runner, it is not always at the same test step, which makes me think that the problem could be due to the lack of instance capacities.
Expected behaviour
I expect job not to hang or at least provide more informative message on the reason of hanging
Relevant logs and/or screenshots
Here is the log of failed job:
Running with gitlab-runner 11.1.0-rc2 (83bc9589) on docker-auto-scale fa6cab46 Using Docker executor with image registry.gitlab.com/drink-it/shop/docker:latest ... Starting service docker:dind ... Pulling docker image docker:dind ... Using docker image sha256:a340e9c87a9562c063519ce3c6a80e315932107e4a56a5cb17adb5dd5658ecd4 for docker:dind ... Waiting for services to be up and running... Pulling docker image registry.gitlab.com/drink-it/shop/docker:latest ... Using docker image sha256:0c7e5b70bd1d97d058dc2961a11ce67738afc511ad888cc8525496e5c320885a for registry.gitlab.com/drink-it/shop/docker:latest ... Running on runner-fa6cab46-project-4675100-concurrent-0 via runner-fa6cab46-srm-1531857937-e044967f... Cloning repository... Cloning into '/builds/drink-it/shop'... Checking out 0394ce19 as 173-product-front-template-collision... Skipping Git submodules setup Checking cache for default-1... Downloading cache.zip from http://runners-cache-3-internal.gitlab.com:444/runner/project/4675100/default-1 Successfully extracted cache $ docker --version Docker version 18.05.0-ce, build f150324 $ docker-compose --version docker-compose version 1.21.2, build a133471 $ docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN $CI_REGISTRY WARNING! Using --password via the CLI is insecure. Use --password-stdin. WARNING! Your password will be stored unencrypted in /root/.docker/config.json. Configure a credential helper to remove this warning. See https://docs.docker.com/engine/reference/commandline/login/#credentials-store Login Succeeded $ docker-compose up -d webserver Creating network "shop_default" with the default driver Pulling redis (redis:alpine)... alpine: Pulling from library/redis Digest: sha256:e57274dac037e5b0c7680717fcaaa0efeffb23430e54e839c50819c9d842a38c Status: Downloaded newer image for redis:alpine Pulling mysql (mysql:5.6)... 5.6: Pulling from library/mysql Digest: sha256:29e32fba52c3e6708fdc8a7678287debe3554febced25ade8686a63d4409ceda Status: Downloaded newer image for mysql:5.6 Pulling memcached (memcached:alpine)... alpine: Pulling from library/memcached Digest: sha256:9206241d87e1ff7101f836bc88ed07b40637a99bbf59788d3a44366dc306d0e8 Status: Downloaded newer image for memcached:alpine Pulling webserver (registry.gitlab.com/drink-it/shop:latest)... latest: Pulling from drink-it/shop Digest: sha256:59b954bcc62f620128d1f5f3c25e454cd7bd292726c1f088722c3f6976ccc16c Status: Downloaded newer image for registry.gitlab.com/drink-it/shop:latest Creating drink-mysql ... Creating drink-redis ... Creating drink-memcached ... Creating drink-mysql ... done Creating drink-memcached ... done Creating drink-redis ... done Creating drink-webserver ... Creating drink-webserver ... done $ cp app/etc/local.xml.docker app/etc/local.xml $ docker-compose exec -T webserver bash -c 'cd /var/www/htdocs/dev && composer install --no-interaction' Do not run Composer as root/super user! See https://getcomposer.org/root for details Loading composer repositories with package information Installing dependencies (including require-dev) from lock file Nothing to install or update Generating autoload files $ docker-compose exec -T webserver bash -c './shell/docker/check-mysql.sh' ERROR 2003 (HY000): Can't connect to MySQL server on 'mysql' (111) database is accessible now. $ docker-compose exec -T webserver bash -c 'set-base-url -c http://drink.loc/' Base URL set to http://drink.loc/ $ docker-compose exec -T webserver bash -c 'phantomjs --webdriver=4444 --ignore-ssl-errors=yes >/dev/null 2>&1 &' $ docker-compose exec -T webserver bash -c './shell/docker/check-phantomjs.sh' PhantomJS is not yet on port :4444 PhantomJS is on port :4444 $ docker-compose exec -T webserver bash -c 'cd /var/www/htdocs/dev && ./vendor/bin/codecept run --steps' Codeception PHP Testing Framework v2.3.6 Powered by PHPUnit 6.4.4 by Sebastian Bergmann and contributors. Acceptance Tests (4) --------------------------------------- 001_HomePageCept: See Home Page Signature: 001_HomePageCept Test: tests/acceptance/001_HomePageCept.php Scenario -- I am on page "/" I see element ".cms-index-index" PASSED 002_CustomerRegistrationCept: Register New Customer Signature: 002_CustomerRegistrationCept Test: tests/acceptance/002_CustomerRegistrationCept.php Scenario -- I have fixtures ["b2c_customer"] I am on page "/" I see "Login / Register" I click "Login / Register" I see "Neues Kundenkonto anlegen" I see "Registrieren" I click "Registrieren" I see "Als Privatperson registrieren" I fill field "Geburtstag (Kein Verkauf an ...","17.07.2001" I click "Konto eröffnen" I see "Der Verkauf an Minderjährige ist verboten." I fill field "Geburtstag (Kein Verkauf an ...","17.07.2000" I click "Konto eröffnen" I wait 1 I don't see "Der Verkauf an Minderjährige ist verboten." I fill field "Vorname","John" I fill field "Nachname","Doe" I fill field "E-Mail Adresse","john.doe+1531858152@exam..." I fill field "E-Mail bestätigen","john.doe+1531858152@e..." I fill field "Passwort","123123q" I fill field "Passwort bestätigen","123123q" I check option "//*[@id="form-validate"]/div[10]/div/di..." I wait for element "//input[contains(@class, 'valida...",30 I see "Konto eröffnen" I click "Konto eröffnen" Pulling docker image gitlab/gitlab-runner-helper:x86_64-83bc9589 ... ERROR: Job failed: execution took longer than 1h0m0s seconds
Environment description
we are using gitlab shared runners
Used GitLab Runner version
Running with gitlab-runner 11.1.0-rc2 (83bc9589) on docker-auto-scale fa6cab46