How to debug GitLab CI failures that do not happen locally

Background:

The main problem is that GitLab workflow is not 100% reproducible locally. I've tried to consolidate the job as much as possible into a common one, but I still get issues on GitLab CI that do not happen locally.

Here's my scenario: I have a project that runs some BDD (Behat) tests. There are three main components:

Behat (or whoever is running the tests) - usually this is the GitLab executor or local environment
Selenium+Browser+VNC - in my case, an official docker image (selenium/standalone-chrome[-debug])
the system under test, eg, WordPress (and other support services, eg, mysql etc)

The workflow looks like this:

ensure any dependencies are met
start the services (docker-compose up*) and wait until they're all responsive (selenium, wordpress etc)
run behat tests

* I didn't use GitLab CI services: to make it more reproducible locally

Originally, this workflow was set out as a GitLab job script, but then I moved this into PHP code since: 1. it was too tied to GitLab 2. it was getting too complicated for a shell script (eg; "if command x, y and z do not run, run commands a and b otherwise return status of command z etc..") So in the end it boils down to running php run-tests.php (with some config passed as env vars). This works for GitLab, local testing and any other CI solution.

The problems I've been encountering is that in at least two occasions, GitLab runs would blow up for very odd reasons:

page/browser crashing for a very simple test, no indication why (same test runs locally with the same docker images and config)
a URL inside GitLab consistently returns a 404 but this never happens locally (and as before, same config, same images)

What questions are you trying to answer?

How do I debug this sort of issues better?
How can I ensure the CI process can be reproduced exactly the same locally (or that it is vendor agnostic)?

Are you looking to verify an existing hypothesis or uncover new issues you should be exploring?

What I did so far is trying log more details and random things like checking memory usage etc. Running the workflow locally didn't produce the same results, even with the gitlab-runner.

What is the backstory of this project and how does it impact the approach?

I've been stuck with these two problems for some two weeks. They're very frustrating and too specific to ask about online. They're a huge waste of time and it seems to me the simplest solution would be if I was able to see the exact state of the builds when the problem happened, without having to manually log the sh!t out of every process (in every container).

What do you already know about the areas you are exploring?

Some related links:

Having a general approach greatly decreases these issues. Right now it takes a lot of internal GitLab executors knowledge to have a script that works without depending on GitLab. Even then, you need to know if failures are from GitLab or not. Running these with the gitlab-runner locally doesn't produce the same results either.

What does success look like at the end of the project?

Consistent success/failures - I expect failing pipelines to be completely reproducible locally.

Edited Oct 05, 2018 by Christian Sciberras