Gitlab Runner Memory Increase
What's wrong with current CI
Our CI pipelines sometimes fail due to low memory.
What I've changed
The gitlab runners that launch on demand were using c5a.large
instances. I've switched them to t3a.large
with Standard credit specification.
Here is the pricing on US East (Ohio) at the time of writing (2021-12-10)
Instance name | On-Demand hourly rate | vCPU | Memory | CPU Model |
---|---|---|---|---|
c5a.large | $0.077 | 2 | 4 GiB | 2nd generation 3.3GHz AMD EPYC 7002 |
t3a.large | $0.0752 | 2 | 8 GiB | 2.5 GHz AMD EPYC 7000 |
Here you can see that t3a
provides more memory at lower cost. T3 instances are burst instances and they have 2 modes Standard and Unlimited which are complicated to explain, if you are interested you can look at here.
What does this affect
- The build jobs will have more memory so they won't fail because of memory hopefully.
- They might run a tad bit slower. (Due to lower clocked cpu)
- They will be a bit cheaper to run.
What I've tried:
I've tried to assign swap space while using the c5a
to stop out-of-memory errors from occuring but I just couldn't manage it.
- Add swap space in gitlab-ci.yaml
- This doesn't work, See https://gitlab.com/autowarefoundation/autoware.auto/AutowareAuto/-/jobs/1872871216#L456 for details
- The reason is that the docker container doesn't support swap.
- Then start the runner instance with swap?
- Gitlab runners we use (that use AWS for scaling) run with deprecated docker-machine
- docker-machine has very little flexibility while running new instances, here are the options it allows one to use and even finding these was hard because documentation is also not maintained, all over the place :p
- There is a large issue on Gitlab to look for an alternative for this large scale problem: gitlab-org/gitlab-runner#3877
- Gitlab runners we use (that use AWS for scaling) run with deprecated docker-machine
Results
So let's see how this affects our pipeline.
I had this job that was failing on
c5a.large
:
https://gitlab.com/autowarefoundation/autoware.auto/AutowareAuto/-/jobs/1872454330#L7500
with this error: cc: fatal error: Killed signal terminated program cc1
and 1 package failed: parking_planner
(that package normally builds fine)
Now I've switched to the t3a.large
instance with 2x memory.
https://gitlab.com/autowarefoundation/autoware.auto/AutowareAuto/-/jobs/1873258031#L11280 in this job, I've added a line that will print memory information every 3 seconds while the Autoware.Auto is building.
There you can see that it is using a lot of memory, free memory dropping to ~118MB at certain points (from total 7882MB). It used to be much worse with previous 4GB memory.
Related issues:
-
#1210 (closed)
- @frederik.beaujean: I suspect the compiler runs out of memory.
-
#1332 (closed)
- @maximeclement: This error apparently occurs because the system runs out of memory.