CI Clusters architecture and structure changes
I was thinking about our current structure and how we can improve that structure on top of what we have now:
Today
- We allow defining Kubernetes Integration and Prometheus Integration,
Tomorrow
- We allow creating Kubernetes clusters,
- We allow using GCP to create a Kubernetes cluster for us,
- We migrate Kubernetes integration to Kubernetes cluster,
- We allow having multiple Kubernetes clusters per-project or per-group,
Current structure
gcp_clusters:
t.integer "project_id", null: false
t.integer "user_id" # user who created cluster
t.integer "service_id" # link to KubernetesService: services
t.boolean "enabled", default: true
t.integer "status"
t.text "status_reason"
t.string "project_namespace"
t.string "endpoint"
t.text "ca_cert"
t.text "kubernetes_token"
t.string "username"
t.text "password"
t.string "gcp_project_id", null: false
t.string "gcp_cluster_zone", null: false
t.string "gcp_cluster_name", null: false
t.string "gcp_machine_type"
t.string "gcp_operation_id"
t.integer "gcp_cluster_size", null: false
t.text "gcp_token"
services: # kubernetes_service
t.text "properties"
- token
- ca_cert
- api_url # we allow only https
- namespace
services: # prometheus_service
t.text "properties"
- api_url
Database/architecture structures
I was thinking about two different ideas of changing that of what we have today.
In all cases the kubernetes_clusters translates to model: Kubernetes::Cluster,
the kubernetes_gcp_providers (or kubernetes_providers_gcps) to Kubernetes::Providers::Gcp.
Structure one
We define a global clusters objects that have many-to-many to groups and projects.
The GCP parameters are attached to clusters.
This assumes that we always create only Kubernetes clusters,
but we might use different providers, only one configured so far: GCP.
We always use generated kubernetes namespace from #{project_path}-#{project_id}
and we disallow user to overwrite it.
kubernetes_clusters:
t.integer "user_id" # user who created cluster
t.integer "service_id" # link to KubernetesService: services
t.boolean "enabled", default: true
t.integer "status"
t.text "status_reason"
t.integer "provider" # user, gcp
t.string "endpoint"
t.text "ca_cert"
t.text "token"
kubernetes_gcp_providers:
t.integer "kubernetes_cluster_id"
t.string "gcp_project_id", null: false
t.string "cluster_zone", null: false
t.string "cluster_name", null: false
t.string "machine_type"
t.string "operation_id"
t.integer "cluster_size", null: false
t.text "token"
t.string "username"
t.text "password"
kubernetes_cluster_projects:
t.integer "cluster_id"
t.integer "project_id"
kubernetes_cluster_groups:
t.integer "cluster_id"
t.integer "group_id"
Structure two
This assumes that we have a list of clusters. Each cluster can be of different provider (GCP), and using different technology (Kubernetes). We create a relation many-to-many to groups and projects.
We always use generated kubernetes namespace from #{project_path}-#{project_id}
and we disallow user to overwrite it.
clusters:
t.integer "user_id" # user who created cluster
t.integer "service_id" # link to KubernetesService: services
t.boolean "enabled", default: true
t.integer "status"
t.text "status_reason"
t.integer "provider" # user, gcp
t.integer "technology" # kubernetes
kubernetes_clusters:
t.string "endpoint"
t.text "ca_cert"
t.text "token"
kubernetes_gcp_providers:
t.integer "cluster_id"
t.string "username"
t.text "password"
t.string "gcp_project_id", null: false
t.string "gcp_cluster_zone", null: false
t.string "gcp_cluster_name", null: false
t.string "gcp_machine_type"
t.string "gcp_operation_id"
t.integer "gcp_cluster_size", null: false
t.text "gcp_token"
kubernetes_cluster_projects:
t.integer "cluster_id"
t.integer "project_id"
kubernetes_cluster_groups:
t.integer "cluster_id"
t.integer "group_id"
Transition period
Whatever we do we have to ensure that Gcp::Cluster or Kubernetes::Cluster
or Cluster implements a DeploymentService.
The code path for Project#deployment_service could then be implemented as:
def deployment_service
return @deployment_service if defined(@deployment_service)
@deployment_service ||= cluster || deployment_services.reorder(nil).find_by(active: true)
end
Having this data model allows us to easily migrate all data from used to proposed structure,
using the post-deployment migration. I don't think that we have to use Background Migration
as we on GitLab.com have only 800 objects of type KubernetesService.
- Migration of the data structure:
Gcp::Cluster.find_each do |cluster|
# reconstruct all objects
end
- Migrating of the
KubernetesService:
KubernetesService.where(active: true).find_each do |kubernetes_service|
# reconstruct all objects
end
- In the next release, we drop all
KubernetesServicerelated code. In the transition period (before we run the post deployments) we would use the code path that would use the old Kubernetes Integration.
Adding apps
Whatever the baseline data structure we choose (even if we leave it as the current one), it doesn't really affect data structure used by Kubernetes apps.
I wonder here whether we will have more apps than Tiller, Runner, Ingress, and Prometheus. I consider two approaches.
Uniform structure
kubernetes_apps:
t.integer :cluster_id
t.string :application # helm, runner, ingress
t.string :release_name
t.string :version
kubernetes_app_versions:
t.integer :kubernetes_app_id
t.string :version
t.string :identifier # we might need that to track the app
kubernetes_app_values:
t.integer :kubernetes_app_id
t.integer :key
t.string :value
We use key-value storage to store additional information and are not be tied to specific apps.
Explicit structure
kubernetes_helm_apps:
t.integer :cluster_id
t.string :version
t.string :deployment_name
kubernetes_runner_apps:
t.integer :cluster_id
t.integer :helm_id
t.string :version
t.integer :runner_id
kubernetes_nginx_ingress_apps:
t.integer :cluster_id
t.integer :helm_id
t.string :version
t.string :domain
t.string :cluster_ip
We use separate tables to store different applications. This allows us to easily extend and migrate small sets of data, but makes the explosion of tables.
Feedback
I need a brainstorming session on all of that with @nolith @dosuken123 @grzesiek @bikebilly. I believe that above improvements and proposals covers all possible cases that we can face in the future.
Migrating from Gcp::Clusters to whether the DB structure 1 or DB structure 2.
I think that we could do in two-three days without tests as the current code is very compact and easy to improve.