Simplify GitLab architecture by shipping Minio

As part of simplifying GitLab architecture, and implementation, we should consider including Minio as our storage backend for every type of access that is based on filesystem.

Doing that, would allow us to remove a lot of boilerplate for manually accessing files and hide that behind abstraction for filesystem access.

I could see the following transition period:

We include minio in GitLab stack,
We configure minio to expose bucket for each storage space that is present today,
We use single ACCESS/KEY pair for all these buckets,
On each node, we run the extra Minio instance,
We automatically configure Rails to use Minio if object storage is not used,
We continue to support local storage until we can get rid of the transition period,
We remove code to handle multiple storage buckets, fallback to a single one,

Open matters:

How we support different storage buckets? (today we allow to have local and remote storage),
Can we remove support for multiple storage buckets?
Can we drop support for that with %12.0?
Can we say that we only allow minio-based or S3-configured based access?

Why this is needed?

We can remove a lot of complexity from GitLab Rails,
We can disallow access to local files, and thus prevent a lot of file system accesses,
Today, Minio can be hidden behind Workhorse and not exposed outside, us Workhorse does support proxying access,
We can adapt all tools to have only one storage interface: remote-based, which greatly simplifies the architecture.
Doing that with Minio, makes our application to be Cloud-Native (by default), not by the particular implementation,

Items to consider

Development efficiency / simplicity gains
Impact on minimum requirements (Memory/CPU)
Scaling requirements for each reference architecture / performance impact
Incremental effort required
- Up front investment to configure
- Effort required to migrate (i.e. when we make this required)
- On-going maintenance: upgrades, backups, etc.
Durability / Resilience
- General availability
- Ability to handle a node failure
Zero downtime upgrades
Disaster Recovery
- Object storage replication within Geo is currently beta: https://docs.gitlab.com/ee/administration/geo/replication/object_storage.html#enabling-gitlab-managed-object-storage-replication

Edited Nov 09, 2020 by silv