We have a project-level service integration for Kubernetes that asks for credentials. We should make it easy to create a new cluster on Google's GKE (and automatically fill in these credentials).
Notes:
We already have Google login with OAuth, so we'll need to figure out how to escalate the granted permissions to include cluster creation.
Ideally, we'd support managing clusters at a group level, but we already have a project-level integration, and this is how people are going to start toying with k8s deploys, so it makes sense to start here.
We should automatically install Prometheus, ingress, etc. but keeping that out of scope for first iteration.
Proposal
existing k8s integration is managed only in the old Settings > Integration page (no changes)
new cluster creation is managed in CI/CD > Cluster page
single cluster support
if one, show name + link to manage on GCP (no details about the cluster)
if one, show "disable" and "soft delete" buttons (if Master)
if none, show "create" button + requirements (if Master)
ask to login with Google, if not yet done
create will ask parameters, as inputbox (plus link to explanation if for free)
When creating a cluster we ask for: GCP project name (required), Cluster name (required), Cluster size, Machine type, Number of nodes, Zone, Project namespace (similar to Kubernetes integration),
We create a new database model which holds google_cloud_clusters, or just clusters,
We store all data about cluster, including a unique identifier that allows us to interact with cluster,
We all operations for cluster creation, or status pulling of cluster do with sidekiq job (for creation), status pulling (reactive cache),
For frontend, we provide an API for query cluster status, an API to login to Google Cloud,
We should have usage ping to track new cluster creation.
Design
Important: All copy in these designs must be reviewed by Product
If the user has not signed in with Google
If the user has not linked their Google account yet, they are shown a button to sign in with Google.
Some help text is shown on this page to let the user know about the requirements their account must meet. Links to GCP documentation are included in this section.
TO DO: Find the appropriate GCP documenation pages to link to
If authentication fails, an error banner is shown
If Google authentication is not set up on the GitLab instance, a message is shown to the user telling them to contact the administrator
Sign in
Error
OAuth not set up
After authentication succeeds
After Google authentication succeeds, a form is shown where the user can enter the necessary values for the new cluster. Similar help text to the previous page is shown here.
There is an additional line which links to our own help page on how to fill out this form.
The fields GCP project ID, Zone and Machine type have links to GCP where the user can see a list of appropriate values.
TO DO: Find the appropriate GCP pages to link to.
Viewing the cluster
Once the cluster has been created, the Cluster no longer shows a creation form. It has the following elements:
A Enable / Disable cluster integration switch
A Save button to apply the sate selected with the switch
A link to GKE so the user can manage their cluster
The cluster's name
A panel to remove cluster integration from the project
A banner message is shown while the cluster is being created. This banner cannot be dismissed.
The banner is replaced with a different message once cluster creation succeeds. This banner can be dismissed.
If an error occurs while creating the cluster, an error banner is shown
View cluster
Creating
Success
Error
Users without permissions
Users without permissions can only see cluster integration status and the cluster's name.
If cluster integration is not set up for the project, the cluster name field is not displayed.
In both cases HTML element for the switch is disabled and cannot be interacted with.
The person that creates the cluster may not have connected their account with their google profile, so I think we'd need a way to let them authenticate this... before showing options of creating a cluster.
https://gitlab.com/gitlab-org/gitlab-ce/issues/27888 starts off by "enabling" a cluster.. like it will just be created... this almost implies to a group level enabled google account. Do you envision a similar thing for project based gke cluster creation?
What about selecting an already existent gke cluster?
You want us to prefill the info, after using the google way of connecting? or rather show it in a different way?
"Creation should let the user pick the number of nodes and the region and let them know what size machines will be used" indicates that if we selected it with the google way.. additional info is present.. Can we detect this info on any cluster, not just gke?
The person that creates the cluster may not have connected their account with their google profile, so I think we'd need a way to let them authenticate this... before showing options of creating a cluster.
@dimitrieh Turning that around, we should show the option to create a cluster, but then drive them through an OAuth and permissions flow if necessary. i.e. show the goal first, and the technical hurdle second. Hiding the option to create the cluster until they've authenticated would bury the capability and hinder adoption.
https://gitlab.com/gitlab-org/gitlab-ce/issues/27888 starts off by "enabling" a cluster.. like it will just be created... this almost implies to a group level enabled google account. Do you envision a similar thing for project based gke cluster creation?
"Enabling" there is an indication of how frictionless we want the experience to be. In reality, cluster creation can take several minutes so maybe we should make that clearer. It does not imply a group-level Google account. We assumed the same kind of OAuth-when-necessary dance.
What about selecting an already existent gke cluster?
Yeah, that's an interesting question; that we haven't explicitly tackled. It should be possible to connect to an existing GKE cluster, and thus get certain admin/monitoring capabilities for it. But the k8s integration already lets you add creds to any existing cluster (GKE or not), so it's not a practical concern yet. Well, except that people might be more comfortable/happier doing an OAuth dance and selecting from a list instead of copying/pasting k8s creds. Perhaps that's a further iteration enhancement, or something after https://gitlab.com/gitlab-org/gitlab-ce/issues/35616.
You want us to prefill the info, after using the google way of connecting? or rather show it in a different way?
I think what you're asking is whether, after the cluster is created, we just fill in the regular k8s integration values, or do something special. I hadn't thought much about it. I guess I assuming more the latter, like we do for the Mattermost integration. There, after you click the button, we just acknowledge that it's configured, but hide the details. If it's easier for a first iteration to use the existing k8s variables, then I'd be open to it, but I think a good implementation would treat the GKE cluster as something more first-class.
"Creation should let the user pick the number of nodes and the region and let them know what size machines will be used" indicates that if we selected it with the google way.. additional info is present
Yeah, it does imply that. :)
Can we detect this info on any cluster, not just gke?
We could detect the number of nodes, but I'm not sure about the machine type. I mean, it's buried in node labels like beta.kubernetes.io/instance-type: n1-standard-2, but I have no idea if this is a convention we can count on for all providers. We certainly wouldn't let them edit the number of nodes unless it's GKE.
@tauriedavis would you be able to chime in here? (as I prob don't have enough time for this issue, and you did previous work with me on the google stuff) cc: @sarrahvesselov
Im not sure I will get to this this week and I will be on vacation next week. I will keep this in my todos incase I have time this week but will leave unassigned incase someone else is able to pick it up.
we should show the option to create a cluster, but then drive them through an OAuth and permissions flow if necessary. i.e. show the goal first, and the technical hurdle second. Hiding the option to create the cluster until they've authenticated would bury the capability and hinder adoption.
@tauriedavis Yeah, that was feedback that our mockups didn't include enough configurability and that selecting region was actually really important. (And I agree, but just didn't think about it when we did the first mockups.)
They're things like us-central1. The dropdown for Zones looks like this.
Actually, it looks like you need to specify a "Zone", which is a sub-thing under regions. e.g. us-centra1-a.
We could let them specify the machine size, but I think we might be better off picking a recommended size. Not sure about it, but it's simpler to pick a reasonable default and give them fewer choices. But if we pick a reasonable size, like n2-standard-2, we should still tell them what size we're using as it affects cost and might impact their choice of number of nodes.
@ayufan Here is the summary of technical challenges on this issue.
Creates a cluster
We can let users to create a cluster with projects.zones.clusters.create method in GKE API. It can leverage almost everything as well as we can do in the GCP Web Console.
Example
We create a cluster in a GCP project "test-autodevops"
The zone of the new cluster is "us-central1-a"
The name of the new cluster is "test-api-creation"
The node size of the new cluster is 1
curl -H"Content-Type: application/json"\-X POST -d'{"cluster":{"name":"test-api-creation","initial_node_count":"1"}}'\https://container.googleapis.com/v1/projects/test-autodevops/zones/us-central1-a/clusters=>{"name":"operation-xxxxxxxx","operationType":"CREATE_CLUSTER","selfLink":"https://container.googleapis.com/v1/projects/xxxxxxxx/zones/us-central1-a/operations/operation-xxxxxxxx","startTime":"2017-09-13T16:49:13.055601589Z","status":"RUNNING","targetLink":"https://container.googleapis.com/v1/projects/xxxxxxxx/zones/us-central1-a/clusters/test-api-creation","zone":"us-central1-a"}
Note
GCP project and other necessary components to create a cluster (e.g. Billing info) should have been prepared before creating a cluster
Gets the details of a specific cluster
We can get the cluster details with projects.zones.clusters.get method in GKE API. It contains necessary data to integrate with k8s, such as Endpoint, ca-certificate, username/password and status.
We can get a cluster username/password, but we can not get a cluster token. Currently, GitLab k8s integration has been using token for authentication, so we need to extend it to use username/password instead of token.
CaCertificate is encoded in Base64
Authentication/Authorization
We need authenticating as an end user for the above methods. This works as well as we have integrated with other services. (i.e. We present a consent screen to the user)
Note
We use https://www.googleapis.com/auth/cloud-platform OAuth scope for both clusters.get and clusters.create.
We also have another option Service accounts. This is tied to the GCP project opposed to a single user account.
We use google/google-api-ruby-client. It's already been included in Gemfile in master branch in CE repo. I used this library to test the above methods. Probably we need to update the gem.
Sample code for geting the details of a specific cluster
```
# BEFORE RUNNING:
# ---------------
# 1. If not already done, enable the Google Container Engine API
# and check the quota for your project at
# https://console.developers.google.com/apis/api/container
# 2. This sample uses Application Default Credentials for authentication.
# If not already done, install the gcloud CLI from
# https://cloud.google.com/sdk and run
# `gcloud beta auth application-default login`.
# For more information, see
# https://developers.google.com/identity/protocols/application-default-credentials
# 3. Install the Ruby client library and Application Default Credentials
# library by running `gem install google-api-client` and
# `gem install googleauth`
TODO: Change code below to process the response object:
puts response.to_json
*Note*- *We don't use the above authentication code. We use this OAuth authentication, instead.*- *This was picked from GKC API doc*</details>---If you have any question, please let me know.
We add radio buttons to the Kubernetes integration page so you can choose how you want to set up your cluster: Use Google Container Engine or Set up manually
The Details panel changes its fields according to the selected option above.
The Manual panel has all the same fields that you can find in the Kubernetes integration page today.
The GKE panel has two fields: Number of nodes and Region. Both are pre-filled with a value chosen by us. The panel also informs the user of the machine size that will be used.
A Sign in with Google button is shown at the bottom. This button doesn't care if the user has already signed in with Google. Once they click it, the OAuth mechanism will take care of that difference.
GKE
Manual
Once the OAuth flow is finished, GitLab will show a banner message saying the cluster is being created:
Once the cluster is created, the panel will display the access details to the cluster.
The user must still check the Active checkbox and click Save changes to use the newly created cluster. This is reflected in the new banner message:
When the user visits this page after the cluster has been set up, the Details panel shows the current configuration.
I'm not sure if we'll be able to modify the cluster's setup on GKE from GitLab. If we can't, we should disable the input fields:
We need a field "GCP project name" to choose which GCP project owns the new cluster. We also need to specify "cluster name", however, it's not so important for the integration, so we can autofill it, I guess.
Couple of questions
Do we need to fetch an array of available GCP project names for dropdown/combobox instead of text filed?
Do we need to fetch an array of available zone names for dropdown/combobox instead of text filed?
Do we need a button to delete a cluster?
Do we need a "Create a new cluster" button instead of "Sing in with Google"? If user can't use OAuth, then do we show a login page?
Instead of Use Google Container Engine, how about Create on Google Container Engine. You can "use" GKE with manual configuration. The difference is that we're creating it for them.
It seems wrong to not active the cluster automatically.
We need a field "GCP project name" to choose which GCP project owns the new cluster.
Good point. I wonder if we can create a project for them too though, since people may not have a project already. But perhaps allow them to specify a project if they do have one, since the project has the billing relationship.
@cperessini What happens if they change the radio button to manual, after creating a cluster? And then goes back?
I wonder if we should use a create button pattern instead of radio buttons. Then after the cluster is created, it just fills in the fields for you. Or maybe hides them, I'm not sure. Thinking of how we did the Mattermost slash commands integration, we had a button that did the magic for you, then changed the display of page to hide the configuration.
Maybe we should provide links to manage the cluster directly if needed, especially if we don't let them delete the cluster.
Eventually we want to let them change the number of nodes. Not sure how that would fit, but it's beyond scope currently.
I wonder if we can create a project for them too though, since people may not have a project already. But perhaps allow them to specify a project if they do have one, since the project has the billing relationship.
The technical difficulty of creating GCP project would be high since it has to be connected to a credit card for enabling GKE. So I propose to tackle it in another iteration. Although, I think we should read all available GCP projects the user owns, and show it as a list, click and select in the form.
@ayufan I've not checked the readability of GCP project via API, and I feel we should include it in the first iteration. Should I spend time for the reaserch?
@cperessini@markpundsack Can we maybe decouple Kubernetes cluster from integration view? Maybe we could start with proposing the separate view within the future Cluster view, as part of a separate page.
I'm asking, as adding that today to Kubernetes/integration is troublesome, and we now that we will want to make Auto DevOps/Cluster a first-class thing, later also including installation of Tiller, Deploy Apps,
Prometheus and Runner. Doing that as part of CI / CD > Cluster | Auto DevOps would make a lot of sense to me, as we then easily extend that view with additional data.
Internally we would, once the cluster is created, configure Kubernetes/integration, but this would be basically a separate entity and flow stored in the database.
@markpundsack @dosuken123 thanks for your reviews, they were really helpful! Holding off on creating new designs until we know if we'll go in the direction @ayufan brought up.
@ayufan It sounds like what you're saying is to do https://gitlab.com/gitlab-org/gitlab-ce/issues/35616, but start with only project-level Clusters, and cut other scope (like only have cluster creation, and not management). That's fine, as long as we get it done in 10.1, and it doesn't slow down the rest of the iterations. Cutting https://gitlab.com/gitlab-org/gitlab-ce/issues/35616 was an attempt at focusing on a smaller MVP, so if you're saying it's actually faster to do that, great!
One question: will this support multiple clusters per project? If not, having a page to manage clusters seems kinda silly.
As this isn't designed yet, there's risk in changing direction this late in the process. You'll need to manage that risk carefully. Let's get a design asap so we can see if it's viable (technically, and from a product perspective). /cc @cperessini
@markpundsack @dosuken123 @ayufan@filipa here are designs for decoupling the Kubernetes integration page from clusters.
We'd need to add a few pages and a new Cluster entity. They're not huge changes, but probably a lot bigger than initially scoped for.
Integration page
Since clusters will be their own entity, there's no need to fill those details in the integration page.
We replace the input fields with just one dropdown where you can select any existing clusters or add a new one.
Integration page
Dropdown
Empty dropdown
Add new cluster
We add a page similar to the New issue or New merge request pages.
The first option on this page is to create a cluster on GCP. Clicking the Google Sign In button will take you to a different page if authentication succeeds. If authentication fails we show an error on this page.
The following section allows you to add an existing cluster by filling in the same information you can enter in the Integration page today.
I added a new Cluster name field so users can later identify the clusters they add to GitLab.
Add cluster
Authentication failed
Create cluster on GCP
If authentication succeeds, you are taken to this page, where you can specify the parameters for your new cluster on GCP.
We really could simplify all of this and just ask for a Cluster name, but I included the fields in case that's the direction we go in.
Some notes about the new fields:
Cluster name: This is the name that will be used by GitLab. I'm not sure if we can also communicate this name to GCP.
GCP Project name: This field is optional. If the user enters one, we create the cluster under that project on GCP. Otherwise, we create a new project with a generic name.
When the user clicks Create, the new GCP cluster gets added to the list of GitLab clusters too.
Clusters list
We add a new Clusters section under CI / CD in the navigation. For now, this will just be a list of clusters, where each row has the following informatino:
Cluster ID
Cluster name
Kubernetes integration status. If this is the selected cluster on the integration page, it will say Yes here. Otherwise it will say No.
Edit button
Delete button
There is no detail page for each individual cluster.
Edit cluster
The edit cluster page has the same fields as the current Integration page, plus the cluster name field.
This page shows the same information whether you created the cluster on GCP or added an existing one.
In a future iteration we can make this page smarter and add GCP-specific fields.
Some notes
The migration to 10.1 would need to create a cluster that uses the info specified in the current K8s integration page.
Do we need to fetch an array of available GCP project names for dropdown/combobox instead of text filed?
Addressed in the design
Do we need to fetch an array of available zone names for dropdown/combobox instead of text filed?
If fetching the list is complicated, we can choose 5-10 regions manually and offer those without having to go to Google.
We can also use a text field, but that will make things more complicated for users
Do we need a button to delete a cluster?
Not on GKE, but we'll need to be able to delete the clusters we add.
Do we need a "Create a new cluster" button instead of "Sing in with Google"? If user can't use OAuth, then do we show a login page?
@ayufan I created a minimal ruby app to test whole processes from authorizing user to getting cluster information. https://gitlab.com/dosuken123/gke_integration_poc. I’ve already confirmed authorized users can access their cluster info. It uses OAuth token when requesting to the API. OAuth token is issued when user login with Gmail account.
This app is using devise, omniauth and omniauth-google-oauth2 which are used in GitLab codebase. To be easily adopted into CE/EE.
Although, one hurdle remains. The sample app is currently authorizing users when user login. But we want to authorize them when they access clusters. In other words, we authorize users with GCP-API-OAuth-scope(https://www.googleapis.com/auth/cloud-platform) when they create a cluster. We don’t include the scope when they login with google account(i.e. we don't touch config.omniauth).
I’m currently investigating this. Maybe we need to use Signet.
FYI, we use google-api-client gem for GCP/GKE API library. This is developed by google. We don’t need to develop API library from a scratch.
@cperessini Thank you for creating awesomw mockup!
Here is a couple of feedbacks/questions.
Add new cluster page
I think showing "Sing in with Google" at the first place would be not ideal. How about putting "Create cluster on GCP" form in this page?
About "Sing in with Google" button. Authentication should be done only once. We don't show authentication button everytime. When user accepted a concent screen, we save the auth_token in GitLab. And We manipulate GCP API as a behalf of the user with the auth_token.
Do we need "Project namespace(optional/unique)"? This seems useless if it's manual configuration.
Please extend the "Token" filed as "Token" or "Username" and "Password" fields. This is because we can't get k8s token, but k8s Username/Password when we crate a new cluster. If user clicked "Create a new cluster" button, we save the "Username" and "Password" instead of "Token". And we authorize the new cluster by the Username/Password. But still "Token" is being used for the manual configuration as described in https://docs.gitlab.com/ce/topics/autodevops/quick_start_guide.html flow.
Create cluster on GCP page
GCP Project name should be required and user should have valid GCP Project before they continue it. We don't suport creation of GCP project in this iteration as this technical difficulty would be high https://gitlab.com/gitlab-org/gitlab-ce/issues/35954#note_40587560. At least, user needs to have a GCP Project which enables billing and API access. We should annotate that beside the GCP Project name filed.
Please change "Region" to "Zone". We can't use "Region" as a parameter.
About Cluster name field. Just to be claer, each cluster already has a cluster name, and that should be unquie under each GCP project. Ideally, this GitLab customizable cluster name and the real cluster name should be identical. In this automation flow, we can synchronize it, but if user setup k8s integration manually, that can be different. This would cause unexpected behavior.
About Cluster name field. I think we should make this filed as reuiqred. We need the cluster name when we check the cluster details via API. FYI, this validation would be failed when another cluster already has taken the name.
Clusters list page
Do we need Cluster ID? This won't be used/refered from anywhere.
"Integrated with Kubernetes" would be confusing because cluster itself is k8s. How about "Used in this project" or something like that.
About Delete button. We should take care of that cluster itself will NOT be deleted (It just deletes associated data in GitLab, because we don't implement cluster deletion in this iteration). This should be shown as an alert. Also, it should be documented.
Edit cluster page
We should point to a link to the cluster in case of the user wants to edit cluster parametes. (e.g. node numbers)
Do we create a new GCP Project if user doesn't have any? -> No. But we should document how to create a valid GCP project (e.g. billing is neccesary, etc).
Do we create a new cluster? -> Yes
Do we update the cluster? (e.g. node numbers) -> No. But we point a link to the cluster. User udpate in GCP Console.
Do we delete the cluster? -> No. But we point a link to the cluster. User delete in GCP Console.
Do we need to fetch an array of available GCP project names for dropdown/combobox instead of text filed? -> No? But show the link to the GCP project lists?
Do we need to fetch an array of available zone names for dropdown/combobox instead of text filed? -> No? Do we put a link to the zone lists?
Do we enable Kubernetes integrations automatically, when user adds a new cluster regardless of manual or auto creation/configuration. -> Yes? (Important)
@dosuken123 random thoughts while getting up to speed on this issue:
Do we need to fetch an array of available GCP project names for dropdown/combobox instead of text filed? -> No? But show the link to the GCP project lists?
Do we need to fetch an array of available zone names for dropdown/combobox instead of text filed? -> No? Do we put a link to the zone lists?
I think the desired answer is 'yes', as it is simpler for the user to select an existing (and so valid) one instead of guessing or moving to a different page. We should be able to fetch them as you need to reach GCP for all other tasks, so it is not a problem of availability. I see valuable also to add a link to the help page on GCP like we do elsewhere ("?" in the blue circle) if user wants to do a conscious choice. It is a link to an external page, so I don't know if it can apply. /cc @cperessini
Possible blockers are:
it takes a long time to fetch the list (bad UX for users while waiting for the dropdown list to load)
it takes a long time to implement (it is still 'yes', but we will cover it in a future iteration)
@cperessini maybe I'm misunderstanding your proposal, but I don't get why we should have both the Integrations > Kubernetes and the CI/CD > Clusters pages.
It seems that all the cluster management (add, edit) is done in the latter, but you still have to use the former to enable. Isn't better to choose one page and make all in there? I'd like the CI/CD > Clusters even if it is more related to Settings, so maybe Settings > Clusters could be a good place (or a new foldable section under Settings > CI/CD).
@cperessini@ayufan if we want to keep it simple and avoid multiple clusters in this iteration (but you know, we should support them eventually, even activated at the same time), we can have just two "options":
use cluster on GKE (list of available ones, or create a new always there as an option)
use existing cluster (old manual configuration)
use no cluster (or a flag somewhere to "disable" any selected choice), if user doesn't want to use a cluster but configuration is still stored for later use
Just to make it clear, if there is no connection with GCP or a Google Account, we should keep the first option anyway and guide the user through the signin/whatever process when selected.
This design could be extended in the future when we'll add group-level clusters. We just need to add a third option with a dropdown showing available clusters configured at group-level.
It scales quite well also if we'll introduce multiple clusters, even if the design will need a review if we want to enable different at the same time for different environments (but it is another story).