This issue is created to track efforts on designing the interface for Custom Executor Provider.
While #28843 (closed) focuses on choosing a solution for how to make a pluggable integration system for the Custom Executor Provider, I'd like to start a separate discussion about what the exposed interface itself should look like.
Some of the criteria that I think the interface of Custom Executor Provider should meet:
As !3291 (merged) documents, there is already an existing interface living in runner's code and making it possible to implement different executors and executor providers. It doesn't mean the interface we want to expose publicly needs to be a 1:1 copy of that internal one. It should instead expose the internal ones behavior and possibilities.
However, the runner side of the implementation itself must implement the internal interface.
Thanks to that, we will be able to add the Custom Executor Provider as a feature that can live together with the existing ones, which makes it possible for slow adaptation and migration to it.
The interface should make it possible to transform our existing Docker Machine Executor to a plugin for Custom Executor Provider with almost no efforts.
This will have two results:
we can quickly start using it for GitLab.com using our existing and known Docker Machine setup (as providing a new autoscaling solution that will consume Custom Executor Provider is another step that will be the "replace docker machine" implementation),
we can prove that the interface supports the current behavior: docker machine provider manages autoscaled VMs (in on demand and autoscale in background modes) and instantiates the Docker executor pointing to Docker Engine on a chosen VM.
Given the above, the Custom Executor Provider should make it possible to configure and call existing executors instantiation. This is the requirement for "replace Docker Machine executor" to be possible, as we should not try to rewrite Docker executor (which is used internally by Docker Machine one).
If we will need to limit the number of existing executors that integrate in this way with Custom Executor Provider, I think we should support at least:
Docker executor (as noted above)
SSH executor.
Why SSH executor? There are systems where containers have a limited adoption (Linux), where there is some sort of container system that Runner doesn't support (like BSD jails) or there is no container concept at all (MacOS). All of that environments are using the Shell executor. But there is no autoscaling solution built into Runner for that.
We've used the Custom Executor to provide autoscaled runners with Windows and we're now extending that for MacOS, but the limitations of Custom Executor and especially limitations of the driver for it that we're using for Windows and MacOS runners doesn't allow us currently to implement an optimal autoscaling solution for them.
Custom Executor Provider that would support creating a SSH executor instance using externally provided host would allow us to support environments that depend on shell executor with a way to autoscale the execution hosts fleet.
To support Windows we would probably need to create a WinRM (which we use in Autoscaler) executor that would be Windows' equivalent of SSH. But this effort would be outside of the scope of building the Custom Executor Provider itself.
I didn't mention Custom Executor on the "must support" list above. Why?
I think that Custom Executor Provider interface that we will build should include a new version of the Custom Executor interface. This would allow use to improve Custom Executor's usability without deprecating the Custom Executor that we already have.
As Custom Executor is a relatively new interface, I think it would be nice if it would exist in the current version for some time being. Runner's community already started building drivers for it and uses it for custom cases. We should not break these environments.
However, if we're going to give the Custom Executor concept of customized provisioning - read: integrate it with the designed Custom Executor Provider in a way as I propose for Docker, SSH and future WinRM - I think it would be good to bring the new pluggable interface here as well.
Tomasz Maczukinchanged the descriptionCompare with previous version
changed the description
Tomasz Maczukinchanged title from Design the interface for the Custom Executor Provider to Design the Custom Executor Provider
changed title from Design the interface for the Custom Executor Provider to Design the Custom Executor Provider
I don't think we should be tackling an Executor interface yet or providing a plugin system for executors. For me, it's too large a scope and I'm not confident that our existing executor's design is something we'd want to make pluggable for others to extend.
If we design a plugin system solely for the lifecycle of cloud provider instances and to pass back credentials, it's a smaller scope and isn't incompatible with these larger goals. The Custom Executor Provider detailed above still needs the ability to handle the lifecycle of instances. We can provide the initial building block, where the cloud providers are plugins, to achieve that, whilst at the same time, focus solely on the problem of replacing the dependency on Docker Machine.
The plan would be to:
Extract the autoscaling logic from Docker Machine Executor and put this into its own library.
The autoscaling logic would call into a provisioner, like https://gitlab.com/jobd/fleeting/fleeting, where users can extend the cloud providers supported by adhering to a simple plugin interface:
typeInstanceGroupinterface{Init(ctxcontext.Context,loggerhclog.Logger,settingsSettings)(ProviderInfo,error)// Update updates instance data from the instance group, passing a function// to perform instance reconciliation.Update(ctxcontext.Context,fnfunc(instancestring,stateState))error// Increase requests more instances to be created. It returns how many// instances were successfully requested.Increase(ctxcontext.Context,nint)(int,error)// Decrease removes the specified instances from the instance group. It// returns instance IDs of any it was unable to request removal for.Decrease(ctxcontext.Context,instances[]string)([]string,error)// ConnectionInfo returns additional information about an instance,// useful for creating a connection.ConnectInfo(ctxcontext.Context,instancestring)(ConnectInfo,error)}
Do the same thing Docker Machine Executor currently does, but have it provision via the autoscaler package, and not by calling out to Docker Machine.
(Optional) Add a plugin to the provisioner for provisioning via Docker-Machine, for a backwards compatible path and for all of the plugins Docker-Machine supports, but with the caveat that this will not be maintained.
For me, this is future proof. No matter how Runner ends up looking in the future, we're likely always going to need a library to handle the lifecycle of an instance.
My opinion is that our target for MVC should be the implementation where we:
pull Docker Machine related code out of Runner's codebase
put it into an external project and wrap it in the new interface implementation
hook that new project into GitLab Runner through the plugin system we're going to design
move the Docker Machine related configuration (in the same or at least similar form) to the plugin configuration
be able to execute a job that previously worked with Docker Machine executor as we have it now in the Runner.
Having that we would be able to deploy the new integration on GitLab.com runners, slightly update our configuration and then start using the new interface but still with Docker Machine under the hood - all in a way transparent for the users.
If that will succeed, then we can start thinking about implementing another plugin for our new interface - a one that will replace the need to use Docker Machine and also be deployed in a way that is totally transparent for the users.
One thing I want to discuss from our earlier meeting is the proposal to create a generic machine provider interface, that you could connect to various executors. There are some caveats with this approach that I noticed while working in the autoscaler project.
Each provider has its own bespoke way of being configured depending on the executor. Let's take the SSH executor for example:
in AWS, you would configure SSH via cloud-init
in Orka, you need to configure it yourself using macOS specific commands
If you think about the docker executor, the way it is set up depends on whether you have a windows machine or a linux one, and how you can access it (tunnel, open port, ...), to name a few.
So in a nutshell, the provider has to know about the executor to prepare the machine. It is a leaky abstraction and it is one of the major design flaws of the autoscaler project, so it would be nice to avoid it in the new design.
To me, it seems like we need some kind of coupling code between executors and providers and they are not directly swappable. The relationship then looks like provider (n)->(1) provider/executor coupler (1)->(n) executor.
As you said, each provider typically has a different way to provision an account/keys and provide a SSH or WinRM connection. Each provider can sometimes have multiple ways to configure this. For example:
GCP allows you to do OS Login
Add an SSH key via metadata
Just already have a machine with baked in credentials
With fleeting, we can provide connector config:
// ConnectorConfig is used to describe how an instance is setup for// connecting to.typeConnectorConfigstruct{OSstring`json:"os"`// the OS of the instanceArchstring`json:"arch"`// The CPU architecture of the instanceProtocolProtocol`json:"protocol"`Usernamestring`json:"username"`Passwordstring`json:"password"`Key[]byte`json:"key"`UseStaticCredentialsbool`json:"use_static_credentials"`Keepalivetime.Duration`json:"keepalive"`Timeouttime.Duration`json:"timeout"`}
This is provided by the user along with options of what cloud provider/plugin they're configuring.
Using this config, the provider plugin then has multiple options available in how it then provisions a user, and it's the provider plugin that handles this.
For the GCP plugin for example, it might be that:
If username, password, static_credentials: Just pass these back. It's assumed this account has been created out-of-band.
If username, key, static_credentials: Same as above, but with a key
If username and key: add the key using metadata
If just username: dynamically create the key and add via metadata
When an instance is then provisioned, a similar configuration is passed back, populated with a username, password/key and protocol and OS and Arch.
Effectively, on input, you're providing hints of how the instance is to be configured. And the plugin makes a good attempt at using those to get credentials. On output, it gives you exactly what you need to secure a connection.
Sometimes, the plugin cannot provide good information without a hint. For example, with GCP, you can query whether an instance is Windows or Linux. Some providers might not allow that, in which case, the user will have to provide that information up-front. So in some cases the plugin can fill in the gaps, in other cases, it cannot, and that's just the nature of different cloud providers having different setups.
Ultimately, the executor should get:
Authentication details
Internal and external address
OS: Linux, Windows etc.
Arch: amd64, darwin etc.
Protocol: SSH or WinRM (or maybe even Serial).
How the executor then uses this is up to the executor and that negotiation happens on another level. Is Docker daemon already installed? Do we need to install it? Is it running k3s? All of that is the concern of the executor and setting up an environment. Obviously to do this, it needs to know whether the machine is Linux or Windows, which is why this is part of the connection information provided back. It may also need to know the architecture, especially if it needs to pass over a binary to then execute.
I have a question about what is being proposed. I apologise if this is the wrong place to ask it, or if my question is covered somewhere. I've read through the proposals and couldn't find what I'm looking for.
At the moment, I've got an auto-scaling infrastructure based on AWS EKS that works with the GitLab Kubernetes Executor. By using Karpenter, I'm able to take the CPU/Memory/Storage requirements from the .gitlab-ci.yaml file and launch an AWS EC2 instance specifically for that job and appropriately sized. I've also used Kyverno to help me modify Pod configurations based on annotations to allow me to support GPU and per-project persistent storage.
That all works well but only for Linux workloads. I want to plan ahead and find a way of supporting Windows in a similar manner but it doesn't look like I'll be able to do that with the tools I'm using today. Hence my interest in what GitLab are doing to provide a new autoscaling mechanism.
My question therefore is whether or not there will be any support for a CI job asking for specific compute requirements and having the executor launch compute to match those requirements. If not, I think that what I'm after could be achieved with a Custom Executor but I was trying to avoid that path if at all possible.