(Mostly-)Reproducible CI instance crash when using KitchenCI testing on Debian 10 & openSUSE Leap 15.2
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary
Attempted to get support on the forum first but didn't receive a response (almost a week now):
We're using GitLab CI for ~100 repos under our SaltStack-Formulas GitHub organisation. This has been working really well since the end of last year.
I've just hit a mostly-reproducible CI crash (Segmentation fault) when refactoring a couple of our formula repos. I cannot cause the same crash in GitHub Actions nor when testing locally. The actual crash appears to be the same type for both repos. For both repos, I can only trigger the crash when testing on Debian 10 and openSUSE Leap 15.2. All the other Debian and openSUSE instances haven't crashed once (not even openSUSE Leap 15.3). None of the other Linux instances are affected. These changes have also been tested in CI and locally on Windows and FreeBSD without issue.
packages-formula
This is the main repo's pipelines:
-
https://gitlab.com/saltstack-formulas/packages-formula/-/pipelines
- While there have been some usual failures, the crash has never been encountered before.
I was testing out the refactor in my fork.
This was the first pipeline where I triggered the crash:
-
https://gitlab.com/myii/packages-formula/-/pipelines/381398800
- Debian 10 -- from line 5629.
- openSUSE Leap 15.2 -- from line 1724.
I then retriggered the CI without pushing another commit and this time only the Debian 10 instance crashed:
I then reduced the instances to only include the likely crash candidates but this time the Leap 15.2 instance crashed:
I had already tested locally a number of times by this point and couldn't reproduce it. Since we've got an easy way to test the same setup in GitHub Actions, I ran that as well, to see if I could reproduce it. I couldn't, even with the subsequent attempts at narrowing down:
- https://github.com/myii/packages-formula/actions/runs/1300190519
- https://github.com/myii/packages-formula/actions/runs/1300234090 -- note, I re-ran this 3 times, as is shown under the title.
In the last pipeline I ran on GitLab, it seemed to be passing for a while but repeating the instances eventually reproduced it again:
php-formula
This is the main repo's pipelines:
-
https://gitlab.com/saltstack-formulas/php-formula/-/pipelines
- Again, the crash has never been encountered before.
I was testing out the refactor in my fork. Same situation took place again.
First pipeline:
-
https://gitlab.com/myii/php-formula/-/pipelines/381561725
- openSUSE Leap 15.2 -- from line 1451.
Reducing the instances and rerunning:
-
https://gitlab.com/myii/php-formula/-/pipelines/381565867/builds
- Debian 10 hasn't failed at all.
- openSUSE Leap 15.2 crashed 3 out of the 5 times here (4 out of 6 overall).
Again, all fine locally and then in GitHub Actions:
- https://github.com/myii/php-formula/actions/runs/1300560616
- https://github.com/myii/php-formula/actions/runs/1300581732 -- reran this 4 times without any failures.
Further information
- The actual gem that seems to be the source of the crash:
- The repo that produces the gem:
- The testing framework being used that is crashing when attempting to run
kitchen verify:
Steps to reproduce
- Clone either of the repos; the bug is specifically in the
ci/use-pillarstackbranch: - Run the pipeline for that branch.
Example Project
Both repos mentioned above.
What is the current bug behavior?
As mentioned in the summary above.
What is the expected correct behavior?
It should pass without crashing, as it does locally, using GitHub Actions and even for the other instances in those GitLab CI pipelines.
Relevant logs and/or screenshots
The links to the crashes are already given in the summary. These are the direct links to the raw logs:
-
packages-formula: -
php-formula:
Output of checks
This bug happens on GitLab.com.