Skip to content

Next

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Support
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
GitLab.com Support Tracker
GitLab.com Support Tracker
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 824
    • Issues 824
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Security & Compliance
    • Security & Compliance
    • Dependency List
    • License Compliance
  • Analytics
    • Analytics
    • CI / CD
    • Insights
    • Issues
    • Repository
    • Value Stream
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • GitLab.com
  • GitLab.com Support TrackerGitLab.com Support Tracker
  • Issues
  • #3710

Closed
Open
Opened Jul 25, 2018 by Gabriel Le Breton@gableroux
  • Report abuse
  • New issue
Report abuse New issue

Running many CI jobs at onces fails with different errors on gitlab.com with shared runners

Description of the problem

When running against many runners at once, I always get errors with EOF, unable to connect to docker host, exit code 1, no space left on device or timeouts.

  • ERROR: Job failed (system failure): Cannot connect to the Docker daemon. Is 'docker daemon' running on this host? (executor_docker.go:1007:0s)`
  • unexpected EOF  
    ERROR: Job failed: exit code 1
  • no space left on device
    ERROR: Job failed: exit code 1
  • ERROR: Job failed: exit code 137
  • ERROR: Job failed (system failure): error during connect: Get https://10.142.2.117:2376/v1.18/containers/a1bf1f3478b898774bf16c396c48eb1b853e70a2bab0e3afae04c7caf7da5215/json: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "root") (executor_docker.go:965:4s)
  • ERROR: Job failed (system failure): Error: No such container: 62a4f326396774ac2a6ff4331afa18d4da3ae253944b6d37272d14421edc41e0 (executor_docker.go:965:1s)

Sometimes, there's no error in the log, but job's description says There has been a runner system failure, please try again.

See big screenshot of pipeline page inside

The pipeline in question is ok now as I clicked retry, but failed jobs are still accessible here: https://gitlab.com/gableroux/unity3d/pipelines/26396580/builds (visit the jobs page and scroll down until you see failed jobs)

Retrying a couple of times these failed jobs works so there's probably something wrong when running many jobs at once. At least it is the case when using free shared runners.

Which Group/Project (with full path) is experiencing the issue?

https://gitlab.com/gableroux/unity3d/

When does the issue happen?

Every time the CI runs in this project. It started doing this as soon as I started building 75 jobs at the same time or more.

Expected behaviour

I shouldn't have to manually retry these jobs as they should not fail in the first place.

Suggestions

  • I'd prefer limiting the number of concurrent builds but at least knowing that all the ones that will run won't fail.
  • Have an actual fix for most of the above errors
  • Have a way to automatically retry jobs n times? That seems already possible according to https://gitlab.com/gitlab-org/gitlab-ce/issues/3442 👍 retry: <number>, default is 0. I did not try that.

Related issues

I have seen a few ones, but I didn't find one including all of the errors I get.

  • gitlab-org/gitlab-runner#2667 (closed)

How to reproduce

  1. Fork https://gitlab.com/gableroux/unity3d/
  2. Wait for the CI to run (will take a while)
  3. After ~40 minutes, you should have a few failed jobs

Thanks

Yeah, just saying thanks as the project in question is actually quite greedy, but it works for free so thanks gitlab. Feel free to contribute 🎉 ✌

Edited Jul 25, 2018 by Gabriel Le Breton

Linked issues

  • Discussion
  • Designs

The one place for your designs

To enable design management, you'll need to meet the requirements. If you need help, reach out to our support team for assistance.

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
0
Labels
None
Assign labels
  • View project labels
Reference: gitlab-com/support-forum#3710