While we have a matrix with several relevant combinations, we have not performed such tests at the scale proposed, where we use real environments and seed them on the fly with a lot of generated data.
Generating data at this scale can bring unforeseen performances of the testing setup.
Goal
This setup targets a smaller registry size to experiment and fine-tune first assumptions. That should leave us better prepared for bigger setups and forecast costs.
If you do not feel the purpose of this issue matches one of the types, you may apply the typeignore label to exclude it from type tracking metrics and future prompts.
Used GET to deploy a 1K Reference Architecture. The instance was configured successfully under a domain and using HTTPS with S3.
The Seeder instance AMI: Amazon Linux, m5.large instance type was deployed and configured. Installed docker and performed sanity testing with a basic 1 GiB script.
@hswimelar what is the best way to have insight on the container registry size after running the seeder? and after running registry database import config.yml where can I find relevant logs?
@svistas after running the seeder, looking at the underlying bucket size should determine how much actual storage was consumed by the script.
The import command logs to stdout and will report summaries for each step with log entries containing the word "complete". The import also ends with a summary line.
Test flight executed on AWS successfully Using the Container Registry Seeder to model the seeding using several EC2 instances. The project aims to make the seeding process as seamless and automated as possible.
The test flight on GCP was concluded. It took a bit longer to go over the Reference Architecture for 1K using GET, but now we have the Container Registry Seeder tool fully adjusted to support AWS and GCP and scale to much bigger loads
Results
In 21 minutes, and 39 seconds of seeding time: for 400 images, 1 tag each, size of image of 3.24 MB, 5 layers per manifest, 2259 blobs with 5 layers, 7685 objects on a GCS bucket size of 4.2 MiB the migration tool, using one-step import approach, took 0 hours, 31 minutes, and approximately 23 seconds.
Next Steps
We are now in a position to play and scale to fine-tune the model we have discussed and understand the results.
@hswimelar Yep, and using the same GitLab instance references so not much difference in machine setup there either I am currently using a dev database that is defined for the local setup (docs) should we use something else or the result difference would not be significant?
I am currently using a dev database that is defined for the local setup (docs) should we use something else or the result difference would not be significant?
@svistas Depending on how particular we wanted to be, we should probably attempt to replicate these results with the same setup to see if we consistently get these times, and we should also get a reference for using a cloud database.
My prediction is that the object storage is going to be the bottleneck, but it's possible that the location of the database makes an appreciable difference, especially as we import larger instances.
We should also publish the regions we use for these cloud providers and also where the import process is running relative to the object storage as well as the database.