Simplify GitLab architecture by shipping Minio
As part of simplifying GitLab architecture, and implementation, we should consider including Minio as our storage backend for every type of access that is based on filesystem.
Doing that, would allow us to remove a lot of boilerplate for manually accessing files and hide that behind abstraction for filesystem access.
I could see the following transition period:
- We include
minioin GitLab stack, - We configure
minioto expose bucket for eachstorage spacethat is present today, - We use single ACCESS/KEY pair for all these buckets,
- On each node, we run the extra
Minioinstance, - We automatically configure Rails to use
Minioif object storage is not used, - We continue to support local storage until we can get rid of the transition period,
- We remove code to handle multiple storage buckets, fallback to a single one,
Open matters:
- How we support different storage buckets? (today we allow to have local and remote storage),
- Can we remove support for multiple storage buckets?
- Can we drop support for that with %12.0?
- Can we say that we only allow minio-based or S3-configured based access?
Why this is needed?
- We can remove a lot of complexity from GitLab Rails,
- We can disallow access to local files, and thus prevent a lot of file system accesses,
- Today, Minio can be hidden behind Workhorse and not exposed outside, us Workhorse does support proxying access,
- We can adapt all tools to have only one storage interface: remote-based, which greatly simplifies the architecture.
- Doing that with Minio, makes our application to be Cloud-Native (by default), not by the particular implementation,
Items to consider
- Development efficiency / simplicity gains
- Impact on minimum requirements (Memory/CPU)
- Scaling requirements for each reference architecture / performance impact
- Incremental effort required
- Up front investment to configure
- Effort required to migrate (i.e. when we make this required)
- On-going maintenance: upgrades, backups, etc.
- Durability / Resilience
- General availability
- Ability to handle a node failure
- Zero downtime upgrades
- Disaster Recovery
- Object storage replication within Geo is currently beta: https://docs.gitlab.com/ee/administration/geo/replication/object_storage.html#enabling-gitlab-managed-object-storage-replication
Edited by silv