Discussion: The word "environments" and its use

Currently we use the word "environment" many places within delivery and indeed within GitLab including

In our documentation (handhook, release-docs)
In our pipelines
In our chatops commands
In our tooling (release tools, k8s-workloads)
Inside the GitLab product itself (Environments under the Deployments tab)

However there is often a lot of confusion both around what constitutes an environment, and our use of the word environment is not consistent everywhere. Some examples include

People getting confused that gstg-cny (or staging canary) is actually the canary stage of the staging environment. Our pipelines and how we talk about them, we often call "gstg-cny" an "environment" which gives the impression of a larger piece of separation from gstg (staging) than there actually is
In our Kubernetes tooling (k8s-workloads), we use the word "environment" completely differently from everywhere else. For example we have environments as a mixture of environment/stages (e.g. gstg-cny) and clusters (e.g. gstg-us-east1-b). This leads to further confusion about what constitutes an environment, what is related and unrelated, and explicitly forces out tooling to maintain logic around mapping environments and stages to clusters (which differs from stage)

Also thinking about the future work with GitLab Pods/Silos, ensuring our terminology and simplifying setup will allow us to expand our model to include this work easier.

Things to improve

To this end I think the following things could be considered

Easily achievable

Decide if we wish to keep the "stage" part of our setup/taxonomy, and if so, be more strict about its usage. E.g. no longer would we refer to the "environment" gstg-cny or staging canary, we would refer to it as staging environment canary stage. We should also develop a common shorthand notation to support this (e.g. staging/* is all stages in staging environment, staging/canary is canary stage of staging environment. A schema around stage naming would also need to be developed and documented.
If we keep stages, then we should document and enforce a rule that "all services in an environment are deployed the same way, regardless of stage". What this means is that our canary stage services would need to be moved from the regional GKE cluster to the zonal GKE clusters. This removes a level of complexity that we have to track in tools and documentation/knowledge around where things are deployed. Everything will be the same, regardless of stage

Achievable, but takes a decent amount of work

Remove the "short hand" name we have for environments, and ensure every environment has one name used everywhere. E.g. we have production, prod, and gprd used interchangeably, this would just become production. We would also establish a schema or set of rules around environment names (e.g. no more than 16 letters all lower case) for use within all tooling and documentation. This would be a big effort to rename a lot of things, but ideally there is one name in use for an environment across all tooling and documentation. The short hand names (I believe) come from our original chef setup.

Quite hard to achieve

We should consider adopting the phrase "deployment target" or "target" in our tooling and documentation instead of "environment". A target is a combination of environment and stage. Some example "deployment targets"

# All stages in production
production/*
or
production

# Canary stage in staging
staging/canary
or
staging/cny

If we decide to go with deployment targets, we should work with the GitLab product team to suggest renaming the products "environments" functionality, or develop new functionality in the "Deployments" tab to accommodate deployment targets. If we feel the environments concept (and naming) is too restrictive, others might feel the same.
We should modify our k8s-workloads tooling (or maybe just gitlab-com) to support only specifying an environment, or environment/stage combination for deploying, removing the setup which is cluster focused.

Outcome of discussion

I think the following take aways have come from this discussion, which can be used to action things and give guidance when building things in the future.

We wish to keep the concept of a stage, at least for the moment. The definition of a stage is a separate logical deployment in an environment, however all stages of a particular environment share the same datastores (redis, database, etc)
We will look to decomission the "short names" of our environments, in favor of developing a concrete naming structure and set of names for all our environments that must be used universally where possible. E.g. preprod, staging, production
Looking at our current tooling and tooling moving forward, if such tooling has the concept of a singular environment "container", we should adopt a structure $environment/$stage. E.g. production/main
We should look at expanding the service catalog to make an "environment catalog" that captures the names of all our environments and maybe some associated details (e.g. google project). The service catalog should directly link to the environment catalog
There exists a linked but separate mapping from $environment/$stage to $cluster/$namespace. We can look at defining these mappings and structures in another catalog in the runbooks repository

Edited Mar 14, 2023 by Graeme Gillies