Infinite loop using Minio S3 as registry backend when removing manifests
Summary
I searched the issue history and Google trying to find an issue like this but was unable to find any relevant results.
While I understand that Minio is not supported, it is listed in the documentation and the result of this issue is pretty severe.
Omnibus MWE is included below along with procedure to reproduce and the log lines which are repeated in a loop.
I came across this issue while testing a GitLab-ce instance out recently with all S3 storage (using configs as per documentation) and found my S3 provider raked up my bill to over $200 in the first day due to a very high number of requests. I didn't understand why and set up my own S3 storage using Minio to investigate.
I created a project, pushed a container, tried to delete a container and encountered the issue after about 20 or so mins.
Using Minio S3 storage (and possibly others?) for containers appears to result in an infinite loop trying to remove a container, the below log lines are repeated, SideKiq queue jumps up, CPU is maxed out and S3 is flooded with requests.
The container is never removed and the loop goes on for at least 10 hours (until I stopped all processes).
I tested on one production system (a dedicated server, where I saw the problem first), another VM from a popular cloud provider (not AWS), and my desktop machine which I used to test and create a MWE.
Steps to reproduce
In my example I am keeping things as simple as possible.
Summary:
- We will be using a basic Minio setup with root credentials, this excludes any issues regarding permissions or policies
- We will be using a Gitlab service on localhost to make testing easy
- On a side note my docker is IPv6 enabled (adjust config if needed)
Start off by creating a ~/gitlab-test directory which we'll be working in.
STEP 1 - Minio
Create the data directory we'll be using mkdir -p data/minio.
Setup and create the Minio buckets ... I used the below docker-compose
version: '3'
services:
minio:
image: quay.io/minio/minio:latest
command: server /data --console-address ":9001"
environment:
MINIO_ROOT_USER: testtest
MINIO_ROOT_PASSWORD: testtest
expose:
- "9000:9000"
- "9001:9001"
volumes:
- './data/minio:/data'
networks:
- gitlab
networks:
gitlab:
driver: bridge
enable_ipv6: true
I ran ...
docker-compose up --remove-orphans
I logged into minio on http://localhost:9001 using testtest/testtestand created buckets (I enabled versioning for all of them as this is mandatory/default for almost any cluster setup)...
- artifacts
- external-diffs
- lfs-objects
- uploads
- packages
- dependency-proxy
- terraform-state
- pages
- registry
I then stopped things by pressing ctrl-c
STEP 2 - Setup GitLab
Create the data directories we'll be using...
mkdir data/{config,logs,gitlab}
Update the docker-compose.yml file with the following...
version: '3'
services:
gitlab:
image: 'gitlab/gitlab-ce:latest'
environment:
GITLAB_OMNIBUS_CONFIG: |
external_url 'http://localhost:9080'
nginx['listen_addresses'] = ['*', '[::]']
# Registry
registry['enable'] = true
registry_external_url 'http://localhost:5050'
registry_nginx['listen_addresses'] = ['*', '[::]']
registry_nginx['listen_https'] = false
registry_nginx['listen_port'] = 5050
# Consolidated object storage configuration
gitlab_rails['object_store']['enabled'] = true
gitlab_rails['object_store']['proxy_download'] = true
gitlab_rails['object_store']['connection'] = {
'provider' => 'AWS',
'aws_access_key_id' => 'testtest',
'aws_secret_access_key' => 'testtest',
'endpoint' => 'http://minio:9000',
'path_style' => true,
}
gitlab_rails['object_store']['objects']['artifacts']['bucket'] = 'artifacts'
gitlab_rails['object_store']['objects']['external_diffs']['bucket'] = 'external-diffs'
gitlab_rails['object_store']['objects']['lfs']['bucket'] = 'lfs-objects'
gitlab_rails['object_store']['objects']['uploads']['bucket'] = 'uploads'
gitlab_rails['object_store']['objects']['packages']['bucket'] = 'packages'
gitlab_rails['object_store']['objects']['dependency_proxy']['bucket'] = 'dependency-proxy'
gitlab_rails['object_store']['objects']['terraform_state']['bucket'] = 'terraform-state'
gitlab_rails['object_store']['objects']['pages']['bucket'] = 'pages'
registry['storage'] = {
's3' => {
'accesskey' => 'testtest',
'secretkey' => 'testtest',
'bucket' => 'registry',
'region' => 'none',
'regionendpoint' => 'http://minio:9000',
'pathstyle' => true
},
'redirect' => {
'disable' => true
}
}
ports:
- '9080:9080'
- '5050:5050'
volumes:
- './data/config:/etc/gitlab'
- './data/logs:/var/log/gitlab'
- './data/gitlab:/var/opt/gitlab'
networks:
- gitlab
minio:
image: quay.io/minio/minio:latest
command: server /data --console-address ":9001"
environment:
MINIO_ROOT_USER: testtest
MINIO_ROOT_PASSWORD: testtest
expose:
- "9000:9000"
- "9001:9001"
volumes:
- './data/minio:/data'
networks:
- gitlab
networks:
gitlab:
driver: bridge
enable_ipv6: true
Start everything up using docker-compose up --remove-orphans.
STEP 3 - Reproducing
Log into http://localhost:9080 (sorry, I had port 8080 being used so I picked 9080).
Create a project under the root account, I used testrepo and "Create README.md file".
On your PC , create a simple Dockerfile in some directory, I used this...
FROM alpine:edge
RUN apk add --no-cache bash vim
Next, build the image with a tag...
docker build -t localhost:5050/root/testrepo .
Next, create a token for your root account with read_registry and write_registry.
Add the login info to docker... (I used username root-registry as the token username)
docker login -u root-registry -p "xxxxxxxxxx" localhost:5050
Next we are going to push the container...
docker push localhost:5050/root/testrepo
Lets create another one...
docker build -t localhost:5050/root/testrepo/test2 .
docker push localhost:5050/root/testrepo/test2
Everything is going to plan, all working 100%, image pulling works, pushing works, no problem.
The problem comes with REMOVING!
Go to the project container registry page, I went here http://localhost:9080/root/testrepo/container_registry.
Now, delete the repositories on the right using the trashcan and wait.
You may now have to wait a few mins or 10's of mins until the job triggers removal. It will flood the logs in the terminal you have running docker-compose up. The S3 storage will also be hit with hundreds of queries per second.
Example Project
I am unsure if this is a good idea, but probably doesn't affect your GitLab.com platform.
What is the current bug behavior?
Logs and S3 storage flooded with requests.
What is the expected correct behavior?
Container repository to be removed.
Relevant logs and/or screenshots
These appear to be the repeated log lines...
gitlab-test-gitlab-1 | ==> /var/log/gitlab/gitlab-rails/application_json.log <==
gitlab-test-gitlab-1 | {"severity":"INFO","time":"2023-01-02T23:31:48.219Z","correlation_id":"59aa3bf18a84b00a69e11eb9055fdcec","service_class":"Projects::ContainerRepository::DeleteTagsService","container_repository_id":2,"project_id":2,"message":"deleted tags","deleted_tags_count":1}
gitlab-test-gitlab-1 |
gitlab-test-gitlab-1 | ==> /var/log/gitlab/sidekiq/current <==
gitlab-test-gitlab-1 | {"severity":"INFO","time":"2023-01-02T23:31:48.219Z","project_id":2,"container_repository_id":2,"container_repository_path":"root/testrepo/test2","tags_size_before_delete":1,"deleted_tags_size":1,"meta.caller_id":"ContainerRegistry::DeleteContainerRepositoryWorker","correlation_id":"59aa3bf18a84b00a69e11eb9055fdcec","meta.root_caller_id":"Cronjob","meta.feature_category":"container_registry","meta.client_id":"ip/","class":"ContainerRegistry::DeleteContainerRepositoryWorker","job_status":"running","queue":"container_repository_delete:container_registry_delete_container_repository","jid":"ae31e50aadc5b6e162d9ab09","retry":0}
gitlab-test-gitlab-1 | {"severity":"INFO","time":"2023-01-02T23:31:48.223Z","retry":0,"queue":"container_repository_delete:container_registry_delete_container_repository","version":0,"status_expiration":1800,"queue_namespace":"container_repository_delete","args":[],"class":"ContainerRegistry::DeleteContainerRepositoryWorker","jid":"ae31e50aadc5b6e162d9ab09","created_at":"2023-01-02T23:31:48.174Z","meta.caller_id":"ContainerRegistry::DeleteContainerRepositoryWorker","correlation_id":"59aa3bf18a84b00a69e11eb9055fdcec","meta.root_caller_id":"Cronjob","meta.feature_category":"container_registry","meta.client_id":"ip/","worker_data_consistency":"always","size_limiter":"validated","enqueued_at":"2023-01-02T23:31:48.174Z","job_size_bytes":2,"pid":530,"message":"ContainerRegistry::DeleteContainerRepositoryWorker JID-ae31e50aadc5b6e162d9ab09: done: 0.047723 sec","job_status":"done","scheduling_latency_s":0.000903,"redis_calls":8,"redis_duration_s":0.000792,"redis_read_bytes":9,"redis_write_bytes":1502,"redis_queues_calls":4,"redis_queues_duration_s":0.000447,"redis_queues_read_bytes":5,"redis_queues_write_bytes":963,"redis_shared_state_calls":4,"redis_shared_state_duration_s":0.000345,"redis_shared_state_read_bytes":4,"redis_shared_state_write_bytes":539,"db_count":9,"db_write_count":5,"db_cached_count":1,"db_replica_count":0,"db_primary_count":9,"db_main_count":9,"db_main_replica_count":0,"db_replica_cached_count":0,"db_primary_cached_count":1,"db_main_cached_count":1,"db_main_replica_cached_count":0,"db_replica_wal_count":0,"db_primary_wal_count":0,"db_main_wal_count":0,"db_main_replica_wal_count":0,"db_replica_wal_cached_count":0,"db_primary_wal_cached_count":0,"db_main_wal_cached_count":0,"db_main_replica_wal_cached_count":0,"db_replica_duration_s":0.0,"db_primary_duration_s":0.002,"db_main_duration_s":0.002,"db_main_replica_duration_s":0.0,"external_http_count":4,"external_http_duration_s":0.009737825001138845,"cpu_s":0.033362,"mem_objects":9935,"mem_bytes":808704,"mem_mallocs":2174,"mem_total_bytes":1206104,"worker_id":"sidekiq_0","rate_limiting_gates":[],"duration_s":0.047723,"completed_at":"2023-01-02T23:31:48.223Z","load_balancing_strategy":"primary","db_duration_s":0.00234}
gitlab-test-gitlab-1 | {"severity":"INFO","time":"2023-01-02T23:31:48.223Z","retry":0,"queue":"container_repository_delete:container_registry_delete_container_repository","version":0,"status_expiration":1800,"queue_namespace":"container_repository_delete","args":[],"class":"ContainerRegistry::DeleteContainerRepositoryWorker","jid":"ecaa8d32d003ef2d99266df3","created_at":"2023-01-02T23:31:48.221Z","meta.caller_id":"ContainerRegistry::DeleteContainerRepositoryWorker","correlation_id":"59aa3bf18a84b00a69e11eb9055fdcec","meta.root_caller_id":"Cronjob","meta.feature_category":"container_registry","meta.client_id":"ip/","worker_data_consistency":"always","size_limiter":"validated","enqueued_at":"2023-01-02T23:31:48.222Z","job_size_bytes":2,"pid":530,"message":"ContainerRegistry::DeleteContainerRepositoryWorker JID-ecaa8d32d003ef2d99266df3: start","job_status":"start","scheduling_latency_s":0.001321}
gitlab-test-gitlab-1 |
gitlab-test-gitlab-1 | ==> /var/log/gitlab/registry/current <==
gitlab-test-gitlab-1 | 2023-01-02_23:31:48.24189 time="2023-01-02T23:31:48.241Z" level=info msg="authorized request" auth_user_name= auth_user_type= correlation_id=01GNTD7BWHQAAPMV5HES9PXF65 go_version=go1.18.7 version=v3.61.0-gitlab
gitlab-test-gitlab-1 | 2023-01-02_23:31:48.24191 {"content_type":"","correlation_id":"01GNTD7BWHQAAPMV5HES9PXF65","duration_ms":0,"host":"127.0.0.1:5000","level":"info","method":"GET","msg":"access","proto":"HTTP/1.1","referrer":"","remote_addr":"127.0.0.1:36784","remote_ip":"127.0.0.1","status":404,"system":"http","time":"2023-01-02T23:31:48.241Z","ttfb_ms":0,"uri":"/gitlab/v1/","user_agent":"GitLab/15.7.0","written_bytes":0}
gitlab-test-gitlab-1 |
gitlab-test-gitlab-1 | ==> /var/log/gitlab/gitlab-rails/application.log <==
gitlab-test-gitlab-1 | 2023-01-02T23:31:48.242Z: {:container_repository_id=>2, :container_repository_path=>"root/testrepo/test2", :project_id=>2, :third_party_cleanup_tags_service=>true}
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
System information System: Current User: git Using RVM: no Ruby Version: 2.7.7p221 Gem Version: 3.1.6 Bundler Version:2.3.15 Rake Version: 13.0.6 Redis Version: 6.2.7 Sidekiq Version:6.5.7 Go Version: unknown GitLab information Version: 15.7.0 Revision: b2a2fb69e66 Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 13.8 URL: http://localhost:9080 HTTP Clone URL: http://localhost:9080/some-group/some-project.git SSH Clone URL: git@localhost:some-group/some-project.git Using LDAP: no Using Omniauth: yes Omniauth Providers: GitLab Shell Version: 14.14.0 Repository storages: - default: unix:/var/opt/gitlab/gitaly/gitaly.socket GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell
Results of GitLab application Check
Expand for output related to the GitLab application check
Checking GitLab subtasks ...Checking GitLab Shell ...
GitLab Shell: ... GitLab Shell version >= 14.14.0 ? ... OK (14.14.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Internal API available: OK Redis available via internal API: OK gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Gitaly ...
Gitaly: ... default ... OK
Checking Gitaly ... Finished
Checking Sidekiq ...
Sidekiq: ... Running? ... yes Number of Sidekiq processes (cluster/worker) ... 1/1
Checking Sidekiq ... Finished
Checking Incoming Email ...
Incoming Email: ... Reply by email is disabled in config/gitlab.yml
Checking Incoming Email ... Finished
Checking LDAP ...
LDAP: ... LDAP is disabled in config/gitlab.yml
Checking LDAP ... Finished
Checking GitLab App ...
Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Cable config exists? ... yes Resque config exists? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... skipped (no tmp uploads folder yet) Systemd unit files or init script exist? ... skipped (omnibus-gitlab has neither init script nor systemd units) Systemd unit files or init script up-to-date? ... skipped (omnibus-gitlab has neither init script nor systemd units) Projects have namespace: ... 2/1 ... yes 1/2 ... yes Redis version >= 6.0.0? ... yes Ruby version >= 2.7.2 ? ... yes (2.7.7) Git user has default SSH configuration? ... yes Active users: ... 1 Is authorized keys file accessible? ... yes GitLab configured to store new projects in hashed storage? ... yes All projects are in hashed storage? ... yes
Checking GitLab App ... Finished
Checking GitLab subtasks ... Finished