As part of the larger cloud native effort, there is a requirement to solve the shared storage requirement for NFS. NFS like storage solutions are not natively available within Kubernetes, and having one as a requirement is not desirable (gitlab-com/migration#23 (closed)).
However, object storage support in general is an EEP feature. We should being this to CE for a few reasons:
As noted above, NFS is fragile and an undesirable solution especially for a cloud native architecture. (gitlab-com/migration#23 (closed))
Object storage is much more commonly available in cloud providers than an NFS like solution.
Shared storage is not supported within Kubernetes natively, and the only solution we can find to provide is, is marked as alpha. (rook.io) This compares to minio which can provide object storage, and is fairly stable.
Supporting two different architectures and storage solutions between EEP and CE/EES would be overly complex and confusion. It would also make more upgrades difficult as well.
It is important to note that our packaged Postgres and Redis HA products remain EEP+.
Current status
Most of the work will be done in !17358 (merged) and gitlab-ee!4736 but some feature are still missing
Once it's completed in ~"EE Premium", we migrate all the components, together, to GitLab CE in a single move.
Step 1 has downstream dependencies in the ~"Cloud Native" and GCP Migration projects. Completing direct object storage in EE first would unlock further development, whereas scheduling the EE-to-CE move first would impact both of these projects, with little upside. Particularly for GCP Migration.
@andrewn you may be closest to this area right now, do you have an idea whether this is relatively easy?
I think so. @ayufan is much better placed to answer that than I, however. @ayufan wdyt?
@andrewn Practically this can be now done by the @smcgivern team in the process of streamlining support for Object Storage. This helps to reduce the changes between codebases and makes us move faster when working on that.
Practically. The Object Storage as "an archive" option can be CE option only, but EE has ability without the FS. So, I think that this is an interesting moment to start with that.
@andrewn It would mean that our GitLabUploader would be the same between CE/EE and we would maintain the same implementation. It means that we might not need to do some of the planned changes.
@andrewn@ayufan I think we can do what we're already doing. Then, at the end of moving uploads to object storage, we can have a follow-up issue to port all of the object storage work (uploads, artifacts, and LFS files) to CE.
That second issue is less critical, while allowing uploads to be migrated is absolutely critical. @mbergeron can work on migrating stuff to CE after doing that.
Without object storage support, our customer facing charts are really hard to work with. It would leave us with undesirable choices like requiring an external NFS service, or depending on something like Rook.
It's in our best interest to get feedback ASAP, so we can learn and adjust before it becomes more painful and expensive. (For example, k8s PVC's are immutable.) The things I am thinking about are: running on multiple cloud vendors, internal k8s farms, a wide array of k8s settings like RBAC, and more.
I'm okay with requiring an EEP license in a pre-beta phase, but I think by official beta release (March 2018?) we should support CE and EES too. This way we can increase the scope of testing and it will also be more representative of the final product.
@andrewn I took this off the Kickoff announcement as it looks like this may or may not make it in 10.4. With something like this I'd rather be conservative than ambitious.
@smcgivern@ayufan and I agreed that we'll deliver https://gitlab.com/gitlab-org/gitlab-ee/issues/4163 first so that we have more time to conduct the fs-to-object-storage migration. After that's delivered, we'll focus on moving object storage across to CE as per this issue.
@joshlambert@smcgivern this part of the code base is still pretty cryptic, but as this is actually porting stuff over, that part shouldn't be too hard.
The hardest part of sending this to CE is to properly document the migration process for our users and making sure all the moving parts (prepare uploads, migrate uploads) are behaving.
Thanks @mbergeron. @joshlambert 2-3 weeks is quite a significant chunk of 10.6 to change part-way through, but I'll let @bikebilly and @victorwu comment on that part. We'd definitely have to cut other issues we were working on if we took this.
Unfortunately it is not feasible to bring the full Object Storage support in %10.6, because work is already in the middle and requires not trivial changes to be ported. Also, since this is mission critical for this release we risk to ship nothing, with bad consequences with the broader plan. The risk is that the port will be done on code that will be heavily changed, making the process quite hard.
It is also quite hard to get it done in %10.7, but we can consider it if we agree on the priority for the company. Otherwise it can be done for %10.8.
So my question is: since we probably cannot have it in time for the Charts launch, is it still a priority for %10.7 or it can be postponed? Or at least, can we identify the minimum set of things that are needed for that specific scope, and "partially" port, then finish later? (I don't know if this last proposal is technically feasible)
What do you think? We can discuss with engineers if we need more details to find a better solution.
We'd definitely have to cut other issues we were working on if we took this.
So my question is: since we probably cannot have it in time for the Charts launch, is it still a priority for %10.7 or it can be postponed? Or at least, can we identify the minimum set of things that are needed for that specific scope, and "partially" port, then finish later? (I don't know if this last proposal is technically feasible)
@bikebilly we are shipping the alpha of our charts in 10.5, and plan to continue to mature with subsequent beta and GA releases.
It is important we at least get this into %10.7, for three reasons:
We are presently relying on object storage for these charts to function, to replace the need for NFS. While we are exploring supporting NFS as an alternate plan (in case there are blockers with direct object storage), this is not ideal for a number of reasons.
We need to be able to fully deprecate our existing helm charts. These are causing confusion for our users, and without CE support in the new charts we cannot realistically do this. (They would also not see the benefits of the new charts.)
Finally, we want to get as much testing as we can on the alpha (or beta) version of the charts. This will help us address any issues and move to GA more quickly. CE support is a large part of this.
@joshlambert thanks for the details, it seems that we have enough to prioritize. We can probably have a clearer idea of the capacity in a week or so, when the current GCP Migration plan will be near to the end.
@ayufan can you please give some information about the effort required, and engineers that could be involved? Is there a way to define the exact scope we need to reach? I'm not sure if moving all the "Object Storage related" things is needed. Thanks!
Are ~Discussion and ~Platform involved in the porting as well? We can balance the load to allow working on additional tasks at the same time.
(Just a side note: I think that the whole process here would have been easier if we had either started with object storage support in Libre, or added it before supporting uploads in object storage. I'm not blaming anyone here - I was part of that decision, and I only realised it in hindsight - but it might be worth bearing in mind in future scenarios like this, that touch a lot of different areas.)
20 February 2018 - after discussions in the confidential issue https://gitlab.com/charts/helm.gitlab.io/issues/231, and in Slack and Zoom, we decide that we will try to interrupt our current work and do this, by having @mbergeron delay his move to a different project so he can work on this. As these discussions weren't explicitly public, I won't put the contents here.
I think it's pretty clear that this was not the best way to go about this, and that we should not try this in future. Right now we have an MR (https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/17358) that is pretty close to ready, however there are risks associated:
We haven't released / deployed a version of GitLab EE yet with all the bug fixes we've been working on this month. The port to CE could introduce more bugs.
The MR is huge, because it incorporates changes we made over a period of about nine months. This makes it hard to review, and just risky in general.
I think we could probably merge that for 10.6, but I wouldn't feel comfortable with it unless we could devote a significant amount of time during the freeze period to testing this, but it's not clear to me where that capacity would come from, as we've already planned 10.7.
@joshlambert well, I can't actually merge it now anyway, because there are conflicts and spec failures. So I would say our options are:
Merge after the freeze, and use an exception to pull it into 10.6.
Pros: any bug fixes will be easier to apply to both CE and EE.
Cons: this is really too big an MR to make an exception for, so we might not get agreement from everyone (justifiably so); we're more likely to introduce bugs.
Merge after the freeze, allocate some time later in the month for testing.
Pros: we can test in a more controlled way, before the 10.7 freeze - some help here from other teams would be good, but it's more about the rate at which we can test; 10.6 object storage in EE is going to be more stable, and (hopefully) ready to enable on GitLab.com in all cases.
Cons: any issues we find in 10.6 will need to be fixed twice - once in master, and once in the 10.6 branch (as they will diverge after this is merged).
Lets focus on merging that after the feature freeze for %10.7. I'm not seeing this likely to be merged into %10.6, I would definitely not feel comfortable with it. This is very big change, and I'm very worried about the potential rushing in fixing CE in the two weeks before the release date.
Let's just make sure we have a process for upcoming Object Storage changes in this release to also get either included in this MR or committed to CE from the beginning, so we don't continue to make the problem worse in %10.7.
Let's just make sure we have a process for upcoming Object Storage changes in this release to also get either included in this MR or committed to CE from the beginning, so we don't continue to make the problem worse in %10.7.
I don't think Discussion have any object storage changes planned in 10.7, apart from the move to CE.