docker-machine: too many open files
Quote from customer
We try to use our AWS to run our pipelines. We configured the autoscaling runner and almost everything works fine. Our pipelines are quite big. We have one build step which prepares our environment, cache requirements and precompiles the assets. At the next step, we run tests in parallel. There are more than 60 jobs running in parallel for a single pipeline. We can have multiple pipelines running at the same time that give us a few hundred jobs. Every job running on a separate EC2 instance. The problem is that not every EC2 instance is terminated. For us it is a blocker cause then we have to pay for every unused server. We want to start the server, do the job and terminate it and be sure that we will not have to pay additional because with such volume and pretty large instances of EC2 it can be a huge cost.
Error
During tests I was looking at docker-machine ls. I found one error Errors: NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS runner-e9zlxh2w-gitlab-docker-machine-1594723359-19c3e844 - amazonec2 Error Unknown MissingParameter: The request must contain the parameter InstanceId. However, later it was not possible cause "too many open files".
Conversation and Logs located in zendesk issue -> https://gitlab.zendesk.com/agent/tickets/163542
config.toml
concurrent = 3000
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "aws-autoscale-runner"
limit = 0
url = "https://gitlab.com"
token = "xxxxxxxx"
executor = "docker+machine"
[runners.custom_build_dir]
[runners.cache]
Type = "s3"
Shared = true
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
AccessKey = "xxxxx"
SecretKey = "xxxxx"
BucketName = "xxx-gitlab-cache"
BucketLocation = "us-east-1"
[runners.cache.gcs]
[runners.docker]
tls_verify = false
image = "ruby:2.6.1"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = true
volumes = ["/cache"]
wait_for_services_timeout = 30
shm_size = 0
[runners.machine]
IdleCount = 0
IdleTime = 0
MaxBuilds = 0
MachineDriver = "amazonec2"
MachineName = "gitlab-docker-machine-%s"
MachineOptions = ["amazonec2-access-key=xxxx", "amazonec2-secret-key=xxxx", "amazonec2-region=us-east-1", "amazonec2-vpc-id=vpc-xxx", "amazonec2-subnet-id=subnet-xxx", "amazonec2-zone=a", "amazonec2-use-private-address=true", "amazonec2-tags=runner-manager-name,gitlab-aws-autoscaler,gitlab,true,gitlab-runner-autoscale,true", "amazonec2-instance-type=c5.xlarge", "amazonec2-ami=ami-02b626bf28808cec7", "amazonec2-root-size=30", "amazonec2-security-group=allow_gitlab"]
OffPeakTimezone = ""
OffPeakIdleCount = 0
OffPeakIdleTime = 1200