Refinement of Cloud Native on 10k Users Reference Architecture Page

After a great collaboration session with @WarheadsSE, I made my way through the 10K architecture cloud native from the perspective of completely fresh eyes and I have some feedback.

In building many instructional design aides I always try to keep the glasses of “My First Time Doing a Cake Recipe” in mind. Even experienced cooks generally prefer at least “one time through following the recipe to the T”. So, unfortunately, that first pass (or the Out Of Box experience for the recipe itself) must be manageable on the first time through.

The new “Cloud Native Deployment Optional” section on the 10K architecture page says this: “We recommend shifting the Sidekiq and Webservice components into Kubernetes to reap cloud native workload management benefits while the others are deployed using the traditional server method already described.” However, there is no such component as “Webservice” in the table at the top. Digging deeper into https://docs.gitlab.com/charts/charts/gitlab/webservice/ one has to guess (uncomfortable) that it really means “GitLab Rails” from the upper table - or some type of variation of Rails. I think it would be very helpful to simply map this terminology in the new section by amending the table from reading “GitLab Webservice pods” to “GitLab Webservice pods (replaces “GitLab Rails” in the above service chart)” These little terminology mapping hints are a huge help to implementers who do not have a readily available mental model of GitLab’s underlying architecture.
When I talked with Jason, they made a very helpful and strongly opinionated clarification that for a Kubernetes hybrid, it must use object storage and it is preferred that PaaS be used for the parts that can be done so - this cloud native specific opinion is not reflected in the Omnibus service chart - nor in the Cloud Native section itself and so does not come across in the Cloud Native appendix. I think the opinion is very valuable and it would bear mentioning in the cloud native section. I feel that requiring interpretation (semantical math) with instructions is too oriented to non-repetition to be effectively navigated.
I feel this phrase “…while the others are deployed using the traditional server method already described.” leaves way too much room for interpretation because the reader does is left to do the “negating of the non-relevant information” - which trusts them more than they likely trust themselves at that point. From our discussions I know there are specifics within that math that are important. Personally I would suggest putting the entire service chart in the Cloud Native section and map it to the exact opinion - this gives a more precise opinion and does not leave the cloud native reference architecture open to as much interpretation. If this feels uncomfortable for some reason, I would encourage not leaving the “important and novel service implementations” uncommented. For instance, without extensive experience with Gitaly, an implementer will not be able to discern that it is extremely important not to put Gitaly into the cluster. It requires a lot of semantical math with this page content to get to what would be seen as a novel idea from a cloud native perspective (that a critical services should NOT be put into the cluster).

To me, instructions are “human code” and have very fuzzy processing to start with (esp. Ref Archs) when comparing across a population of individuals. In my experience writing such things for general audiences, it is better to have tight interpretation as the baseline and then note flexibility as an exception in footnotes. “Trying not to be overly specific” and/or “trying not to repeat information” gives fuzzy inputs to fuzzy processing - and predictably, it results in a lot of unique hairballs 😄

Edited Feb 10, 2021 by DarwinJS