(Mostly-)Reproducible CI instance crash when using KitchenCI testing on Debian 10 & openSUSE Leap 15.2

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Summary

Attempted to get support on the forum first but didn't receive a response (almost a week now):


We're using GitLab CI for ~100 repos under our SaltStack-Formulas GitHub organisation. This has been working really well since the end of last year.

I've just hit a mostly-reproducible CI crash (Segmentation fault) when refactoring a couple of our formula repos. I cannot cause the same crash in GitHub Actions nor when testing locally. The actual crash appears to be the same type for both repos. For both repos, I can only trigger the crash when testing on Debian 10 and openSUSE Leap 15.2. All the other Debian and openSUSE instances haven't crashed once (not even openSUSE Leap 15.3). None of the other Linux instances are affected. These changes have also been tested in CI and locally on Windows and FreeBSD without issue.


packages-formula

This is the main repo's pipelines:

I was testing out the refactor in my fork.

This was the first pipeline where I triggered the crash:

I then retriggered the CI without pushing another commit and this time only the Debian 10 instance crashed:

I then reduced the instances to only include the likely crash candidates but this time the Leap 15.2 instance crashed:

I had already tested locally a number of times by this point and couldn't reproduce it. Since we've got an easy way to test the same setup in GitHub Actions, I ran that as well, to see if I could reproduce it. I couldn't, even with the subsequent attempts at narrowing down:

In the last pipeline I ran on GitLab, it seemed to be passing for a while but repeating the instances eventually reproduced it again:


php-formula

This is the main repo's pipelines:

I was testing out the refactor in my fork. Same situation took place again.

First pipeline:

Reducing the instances and rerunning:

Again, all fine locally and then in GitHub Actions:


Further information


Steps to reproduce

  1. Clone either of the repos; the bug is specifically in the ci/use-pillarstack branch:
  2. Run the pipeline for that branch.

Example Project

Both repos mentioned above.

What is the current bug behavior?

As mentioned in the summary above.

What is the expected correct behavior?

It should pass without crashing, as it does locally, using GitHub Actions and even for the other instances in those GitLab CI pipelines.

Relevant logs and/or screenshots

The links to the crashes are already given in the summary. These are the direct links to the raw logs:

Output of checks

This bug happens on GitLab.com.

Edited by 🤖 GitLab Bot 🤖