Runner stuck on pending and after "Successfully extracted cache"
Issue from here
Adding it here since this is the appropriate project?
Runner keeps getting stuck on pending, even on new standalone instance with no other jobs.
Note: using Docker executor with gitlab-runner 10.8.0.
It also gets stuck running right in the middle of a job, usually after
Successfully extracted cache
Steps to reproduce
Nothing special, just keep trying to run jobs, it happens quite frequently
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
System information System: Ubuntu 18.04 Proxy: no Current User: git Using RVM: no Ruby Version: 2.3.7p456 Gem Version: 2.6.14 Bundler Version:1.13.7 Rake Version: 12.3.1 Redis Version: 3.2.11 Git Version: 2.16.3 Sidekiq Version:5.0.5 Go Version: unknown
GitLab information Version: 10.8.1-ee Revision: 921025f5ffa Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: postgresql DB Version: 9.6.8 URL: https://git.company.com HTTP Clone URL: https://git.company.com/some-group/some-project.git SSH Clone URL: firstname.lastname@example.org:some-group/some-project.git Elasticsearch: no Geo: no Using LDAP: no Using Omniauth: no
GitLab Shell Version: 7.1.2 Repository storage paths:
- default: /var/opt/gitlab/git-data/repositories Hooks: /opt/gitlab/embedded/service/gitlab-shell/hooks Git: /opt/gitlab/embedded/bin/git
Results of GitLab application Check
Expand for output related to the GitLab application check
Checking GitLab Shell ...
GitLab Shell version >= 7.1.2 ? ... OK (7.1.2) Repo base directory exists? default... yes Repo storage directories are symlinks? default... no Repo paths owned by git:root, or git:git? default... yes Repo paths access is drwxrws---? default... yes hooks directories in repos are links: ... 4/8 ... ok 4/10 ... ok 4/12 ... ok 4/14 ... ok Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Check GitLab API access: OK Redis available via internal API: OK
Access to /var/opt/gitlab/.ssh/authorized_keys: OK gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Sidekiq ...
Running? ... yes Number of Sidekiq processes ... 1
Checking Sidekiq ... Finished
Reply by email is disabled in config/gitlab.yml Checking LDAP ...
LDAP is disabled in config/gitlab.yml
Checking LDAP ... Finished
Checking GitLab ...
Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... 4/8 ... yes 4/10 ... yes 4/12 ... yes 4/14 ... yes Redis version >= 2.8.0? ... yes Ruby version >= 2.3.5 ? ... yes (2.3.7) Git version >= 2.9.5 ? ... yes (2.16.3) Git user has default SSH configuration? ... yes Active users: ... 2 Elasticsearch version 5.1 - 5.5? ... skipped (elasticsearch is disabled)
Checking GitLab ... Finished
Not sure if this is related, but I had spent many hours in frustration trying to inject an ssh private key into the docker executor, as instructed here. I had already successfully done this many times, but for some reason it wasn't working.
After sleeping on it, I looked at my last failed job (timeout after 100 minutes), and tried to run it again. It worked. The funny thing is I had not changed my
.gitlab-ci.yml file at all, and it had already failed many times with
GitLab: The project you were looking for could not be found. Not sure if it's related to the stuck on pending issue or not.
image: google/dart:1.24.3 variables: REGISTRY: git.company.com:4567 GIT_SUBMODULE_STRATEGY: recursive cache: paths: - path/to/stuff/ before_script: - mkdir -p ~/.ssh - echo "$SSH_PRIVATE_KEY" | tr -d '\r' > ~/.ssh/id_rsa - chmod 700 ~/.ssh/id_rsa - eval "$(ssh-agent -s)" - ssh-add ~/.ssh/id_rsa - ssh-keyscan -H 'git.company.com' >> ~/.ssh/known_hosts types: - analyze test_main: type: analyze script: - cd main - pub get - dartanalyzer --fatal-hints --fatal-warnings web/main.dart
Log on fail:
Running with gitlab-runner 10.8.0 (079aad9e) on docker 5fa73ab5 Using Docker executor with image google/dart:1.24.3 ... Pulling docker image google/dart:1.24.3 ... Using docker image sha256:ec4b124b54db920b6082d8b6d754b0329278c6deb798f9af1f5b1f7fda2c99de for google/dart:1.24.3 ... Running on runner-5fa73ab5-project-12-concurrent-0 via gitlab... Fetching changes... HEAD is now at 06796d55 Remove old ssh key From https://git.company.com/group/project 06796d55..abceee93 ci-mods -> origin/ci-mods Checking out abceee93 as ci-mods... Updating/initializing submodules recursively... Synchronizing submodule url for 'other-project' Entering 'other-project' HEAD is now at e6ca36d My commit message Checking cache for default-1... Successfully extracted cache $ which ssh-agent || ( apt-get update -y && apt-get install openssh-client -y ) /usr/bin/ssh-agent $ eval $(ssh-agent -s) Agent pid 13 $ echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add - > /dev/null Identity added: (stdin) (rsa w/o comment) $ mkdir -p ~/.ssh $ chmod 700 ~/.ssh $ ssh-keyscan 'git.company.com' >> ~/.ssh/known_hosts # git.company.com SSH-2.0-OpenSSH_7.6p1 Ubuntu-4 # git.company.com SSH-2.0-OpenSSH_7.6p1 Ubuntu-4 # git.company.com SSH-2.0-OpenSSH_7.6p1 Ubuntu-4 $ chmod 644 ~/.ssh/known_hosts $ cd main $ pub get Resolving dependencies... Git error. Command: git clone --mirror email@example.com:group/other-project /root/.pub-cache/git/cache/other-project-89f06eef00baf7ccb8eff6b5c0fcd5f84a517fb3 Cloning into bare repository '/root/.pub-cache/git/cache/other-project-89f06eef00baf7ccb8eff6b5c0fcd5f84a517fb3'... GitLab: The project you were looking for could not be found. fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists. ERROR: Job failed: exit code 1
To be clear, the project it was failing on is on the same standalone Gitlab server.
This is pernicious behavior, not the least because it violates Einstein's quote:
Insanity: doing the same thing over and over again and expecting different results.
Update: Made small change and build is again stuck on
Successfully extracted cache,
Unfortunately, this is holding up our entire development.