CI Clusters architecture and structure changes

I was thinking about our current structure and how we can improve that structure on top of what we have now:

Today

We allow defining Kubernetes Integration and Prometheus Integration,

Tomorrow

We allow creating Kubernetes clusters,
We allow using GCP to create a Kubernetes cluster for us,
We migrate Kubernetes integration to Kubernetes cluster,
We allow having multiple Kubernetes clusters per-project or per-group,

Current structure

gcp_clusters:
    t.integer "project_id", null: false
    t.integer "user_id" # user who created cluster
    t.integer "service_id" # link to KubernetesService: services

    t.boolean "enabled", default: true

    t.integer "status"
    t.text "status_reason"

    t.string "project_namespace"
    t.string "endpoint"
    t.text "ca_cert"
    t.text "kubernetes_token"

    t.string "username"
    t.text "password"

    t.string "gcp_project_id", null: false
    t.string "gcp_cluster_zone", null: false
    t.string "gcp_cluster_name", null: false
    t.string "gcp_machine_type"
    t.string "gcp_operation_id"
    t.integer "gcp_cluster_size", null: false
    t.text "gcp_token"

services: # kubernetes_service
    t.text "properties"
       - token
       - ca_cert
       - api_url # we allow only https
       - namespace

services: # prometheus_service
    t.text "properties"
       - api_url

Database/architecture structures

I was thinking about two different ideas of changing that of what we have today. In all cases the kubernetes_clusters translates to model: Kubernetes::Cluster, the kubernetes_gcp_providers (or kubernetes_providers_gcps) to Kubernetes::Providers::Gcp.

Structure one

We define a global clusters objects that have many-to-many to groups and projects. The GCP parameters are attached to clusters. This assumes that we always create only Kubernetes clusters, but we might use different providers, only one configured so far: GCP.

We always use generated kubernetes namespace from #{project_path}-#{project_id} and we disallow user to overwrite it.

kubernetes_clusters:
    t.integer "user_id" # user who created cluster
    t.integer "service_id" # link to KubernetesService: services

    t.boolean "enabled", default: true

    t.integer "status"
    t.text "status_reason"

    t.integer "provider" # user, gcp

    t.string "endpoint"
    t.text "ca_cert"
    t.text "token"

kubernetes_gcp_providers:
    t.integer "kubernetes_cluster_id"

    t.string "gcp_project_id", null: false
    t.string "cluster_zone", null: false
    t.string "cluster_name", null: false
    t.string "machine_type"
    t.string "operation_id"
    t.integer "cluster_size", null: false
    t.text "token"

    t.string "username"
    t.text "password"

kubernetes_cluster_projects:
    t.integer "cluster_id"
    t.integer "project_id"

kubernetes_cluster_groups:
    t.integer "cluster_id"
    t.integer "group_id"

Structure two

This assumes that we have a list of clusters. Each cluster can be of different provider (GCP), and using different technology (Kubernetes). We create a relation many-to-many to groups and projects.

We always use generated kubernetes namespace from #{project_path}-#{project_id} and we disallow user to overwrite it.

clusters:
    t.integer "user_id" # user who created cluster
    t.integer "service_id" # link to KubernetesService: services

    t.boolean "enabled", default: true

    t.integer "status"
    t.text "status_reason"

    t.integer "provider" # user, gcp
    t.integer "technology" # kubernetes

kubernetes_clusters:
    t.string "endpoint"
    t.text "ca_cert"
    t.text "token"

kubernetes_gcp_providers:
    t.integer "cluster_id"

    t.string "username"
    t.text "password"

    t.string "gcp_project_id", null: false
    t.string "gcp_cluster_zone", null: false
    t.string "gcp_cluster_name", null: false
    t.string "gcp_machine_type"
    t.string "gcp_operation_id"
    t.integer "gcp_cluster_size", null: false
    t.text "gcp_token"

kubernetes_cluster_projects:
    t.integer "cluster_id"
    t.integer "project_id"

kubernetes_cluster_groups:
    t.integer "cluster_id"
    t.integer "group_id"

Transition period

Whatever we do we have to ensure that Gcp::Cluster or Kubernetes::Cluster or Cluster implements a DeploymentService.

The code path for Project#deployment_service could then be implemented as:

def deployment_service
  return @deployment_service if defined(@deployment_service)

  @deployment_service ||= cluster || deployment_services.reorder(nil).find_by(active: true)
end

Having this data model allows us to easily migrate all data from used to proposed structure, using the post-deployment migration. I don't think that we have to use Background Migration as we on GitLab.com have only 800 objects of type KubernetesService.

Migration of the data structure:

Gcp::Cluster.find_each do |cluster|
    # reconstruct all objects
end

Migrating of the KubernetesService:

KubernetesService.where(active: true).find_each do |kubernetes_service|
    # reconstruct all objects
end

In the next release, we drop all KubernetesService related code. In the transition period (before we run the post deployments) we would use the code path that would use the old Kubernetes Integration.

Adding apps

Whatever the baseline data structure we choose (even if we leave it as the current one), it doesn't really affect data structure used by Kubernetes apps.

I wonder here whether we will have more apps than Tiller, Runner, Ingress, and Prometheus. I consider two approaches.

Uniform structure

kubernetes_apps:
  t.integer :cluster_id
  t.string :application # helm, runner, ingress
  t.string :release_name
  t.string :version

kubernetes_app_versions:
  t.integer :kubernetes_app_id
  t.string :version
  t.string :identifier # we might need that to track the app

kubernetes_app_values:
  t.integer :kubernetes_app_id
  t.integer :key
  t.string :value

We use key-value storage to store additional information and are not be tied to specific apps.

Explicit structure

kubernetes_helm_apps:
  t.integer :cluster_id
  t.string :version
  t.string :deployment_name

kubernetes_runner_apps:
  t.integer :cluster_id
  t.integer :helm_id
  t.string :version
  t.integer :runner_id

kubernetes_nginx_ingress_apps:
  t.integer :cluster_id
  t.integer :helm_id
  t.string :version
  t.string :domain
  t.string :cluster_ip

We use separate tables to store different applications. This allows us to easily extend and migrate small sets of data, but makes the explosion of tables.

Feedback

I need a brainstorming session on all of that with @nolith @dosuken123 @grzesiek @bikebilly. I believe that above improvements and proposals covers all possible cases that we can face in the future.

Migrating from Gcp::Clusters to whether the DB structure 1 or DB structure 2. I think that we could do in two-three days without tests as the current code is very compact and easy to improve.

Edited Oct 12, 2017 by Kamil Trzciński