Translator for Scripted JenkinsFiles (Groovy Pipeline and DSL formats) to .gitlab-ci.yml
Problem to solve
It is not obvious how to convert a JenkinsFile (a Groovy DSL, or more recent full programming language) into a static YAML syntax, and our users need help to achieve this.
From a Professional Services perspective the vast majority of our customers are going to be on the Jenkins Groovy Pipelines and few will be on the Jenkins DSL.
- Scripted Jenkinsfile. This is a Jenkinsfile that’s written entirely in Groovy. It lacks stages and nodes. It’s essentially a Groovy/Java file with all kinds of logic inside it. This file can also include massive libraries of code to invoke. This is the hardest to convert and interpret.
Intended users
Further details
Link to original discussion threads in meta epic: &2735 (comment 295171851), &2735 (comment 295172306), and &2735 (comment 295172127).
Proposal
There are many ways to approach this, the simplest most straightforward would be partially automate conversion of Jenkinsfile to gitlab-ci.yml (imagine something like this but for CI yml file https://www.base64decode.org/ which is not tied to any project really).
This could take advantage of dynamic child pipelines to put the processor in the originator and the job to be run in the triggered child.
Next, it could be something like:
- Automatically create a new MR with updated gitlab-ci.yml
- During project import, if Jenkinsfile is located, automatically create gitlab-ci.yml
- Say you have 50 projects and you want to convert them all, go thru each, convert each jenkinsfile and create MRs automatically (this will require Jenkins API access, not even sure if it's gonna be possible to access Jenkinsfile in such a way)
Research so far
@georgekoltsov did a fair amount of research into this:
Jenkinsfile is written in Groovy. Groovy is a full programming language and anyone creating a Jenkinsfile is writing code, with an ability to define methods, integrate plugins with custom DSL. This makes a conversion substantially harder.
- First thing that came to my mind is to use regular expressions, but because there are so many ways of writing Jenkinsfile (see examples here https://jenkins.io/doc/pipeline/examples/) it is going to be very hard to grab appropriate stages/commands and avoid/filter out all the custom code. Maintenance is also a concern.
- Next, I explored an option of creating a Lexer & Parser for it. This is what you usually start with when creating a new programming language. But that essentially means creation of Lexer/Parser for Groovy. I briefly spoke to @yorickpeterse about it and he advised that creating a good parser can easily take months.
- Finally, I explored an option of developing a Jenkins plugin that would do the transformation for us on Jenkins itself, since Jenkins is able to evaluate Groovy. I found API in Jenkins Pipeline plugin that allows transformation of Groovy Jenkinsfile to JSON (https://github.com/jenkinsci/pipeline-model-definition-plugin/blob/master/EXTENDING.md#conversion-to-json-representation-from-jenkinsfile).
It transforms Jenkinsfile like this:
pipeline { agent any stages { stage('Build') { steps { echo 'Building..' } } stage('Test') { steps { echo 'Testing..' } } stage('Deploy') { steps { echo 'Deploying....' } } } }
into JSON
{"status":"ok","data":{"result":"success","json":{"pipeline":{"stages":[{"name":"Build","branches":[{"name":"default","steps":[{"name":"echo","arguments":[{"key":"message","value":{"isLiteral":true,"value":"Building.."}}]}]}]},{"name":"Test","branches":[{"name":"default","steps":[{"name":"echo","arguments":[{"key":"message","value":{"isLiteral":true,"value":"Testing.."}}]}]}]},{"name":"Deploy","branches":[{"name":"default","steps":[{"name":"echo","arguments":[{"key":"message","value":{"isLiteral":true,"value":"Deploying...."}}]}]}]}],"agent":{"type":"any"}}}}}
The only problem here is that it only transforms very basic Jenkinsfiles. It must have
pipeline
keyword present (it is not a requirement in Jenkinsfile, see https://jenkins.io/doc/pipeline/examples/#artifactory-maven-build for example) as well as other keywords likestages
, which is not a requirement either. If the file does not meet requirements converter will return error messages (e.g.{"status":"ok","data":{"result":"failure","errors":[{"error":["Undefined section \"node\" @ line 1, column 12.","Missing required section \"stages\" @ line 1, column 1.","Missing required section \"agent\" @ line 1, column 1."]}]}}
).So in the end I think the best approach would be:
- Have a new page in GitLab requesting a Jenkins URL, and Jenkins Crumb for authentication
- Have a user to upload a desired Jenkinsfile (there seem to be no Jenkins API that would allow to get Jenkinsfile source from Jenkins projects, so no way to do this automatically)
- Make POST request to JenkinfileToJson endpoint to get JSON representation of the file
- Convert it on our side and present user with a suggested yaml file.
I am not too sure if this is going to provide the desired value, though, because in cases of complex Jenkinsfile it (and I am totally guessing here) is going to fail due to some sort of custom syntax (so manual edit of Jenkinsfile is still required) or in case of simple Jenkinsfile there is a guide already, with which transformation can be done fairly quickly.
@sarcila also has done some research:
yesterday I was experimenting a bit around with how to import Jenkins, at the beginning I was trying to get an AST from Jenkinsfile using tree-sitter https://tree-sitter.github.io/tree-sitter/ with a hacky grammar of Groovy and after trying to tweak it a bit, it was obvious to me what @yorickpeterse said about building one, so another approach that I tried is to use groovy AST builder https://github.com/apache/groovy/blob/ab2b7d827f9eddc552b7c6e8fe976cd59c0477e2/subprojects/parser-antlr4/src/main/java/org/apache/groovy/parser/antlr4/AstBuilder.java (or Antlr grammar https://github.com/apache/groovy/tree/master/src/antlr) it will be matter of converting Groovy undefined nodes to GitLab pipeline nodes, another option is to follow the same approach as Jenkins and use a mix of GroovyCodeSource and GroovyShell to execute Groovy with our custom bindings, the problem at the end with both approaches is how to support the plugins, because a plugin will define custom steps or actions that are not mapped to GitLab pipelines, and the only way with an approach like this will be to support plugin exports as well, that in itself could be a challenge, what to do with conflicting plugins or dependencies? (as @georgekoltsov already mentioned).
A caveat of an initial solution/tool using this approach is that it will be built outside GitLab codebase using a mix of Groovy/JVM, so we can reuse the existent code around Groovy
Additional notes from @jackie_fraser:
I experimented with a simple regex-based method that could search and replace some simple jenkinsfiles examples that I found on the web.
These Jenkinsfiles:
- used pipeline {}
- used environment {}
- used stages {}
- used steps {} in between stages
- could use when {} branches (limited support)
The regex method then could generate a corresponding gitlab-ci.yml
When encountering more complicated pipeline structures (including parallel) it currently skips those lines/sections of the jenkinsfile.
The resulting gitlab-ci.yml file:
- is usually a partially converted version of the jenkinsfile
- converted lines of scripts found in the jenkinsfile to script lines in the gitlab-ci.yml. These script lines won't actually run on GitLab CI because they wouldn't exist on the docker image and so more work would need to be done manually to get the gitlab-ci.yml to work.
Nevertheless, it could create a gitlab-ci.yml to get started with, which is created around the steps they're currently doing in Jenkins.
Attached is an example of one example Jenkinsfile with the two files compared after running through the experimental parser:
A lot of fragile work would be involved in parsing more complicated Jenkins file structures.
This idea of converting existing Jenkins process directly to gitlab-ci.yml might not be the optimal way to set up a project though.
Creating a project's first gitlab-ci.yml file for a project that is currently using Jenkins for CI can be a lot more simple than the way it was laid out in Jenkins.
A project could likely achieve a "better" starter gitlab-ci.yml file by using one of the templates that already exist for gitlab-ci.yml in New File section, and choosing a template from the type of project they are running. Perhaps that existing feature could be incorporated in to directing a project using Jenkins for CI to turn on GitLab AutoDevOps and set up a gitlab-ci.yml from template instead.
Another option is to read their Jenkins file and detect code words found in it, to suggest an existing template gitlab-ci.yml to start with.
Both paths both have further manual steps to get CI fully working, but get the project one step closer to replacing Jenkins with GitLab CI.
Response from @dstull:
I would tend to agree with this analysis.
From my experience with Jenkins, I'm not sure how feasible it is to provide a fully working conversion product here, even for the declarative pipeline files. Some of that is due to the variability of pipeline files and the Jenkins product itself(such a moving target it seems in the Jenkins world).
I also agree that even if we do convert the Jenkins files into our file format, it will likely not be the optimal "GitLab" way to do it as mentioned above.
That being said, is it worth it at that point to go down that path? If it is deemed to be, it maybe only serviceable for a small percentage of Jenkins files, and then if we can't convert the file direct the user to one of the starter templates we have that matches the language/framework type that their project is using.
Either choice has merit(converter path vs guide more toward using templates), but I hesitate to committing to something we will just fail to deliver fully on. Iteration is key, but if we can't see a path to meet our goals, we might be left with something that never fully meets our expectations.
If we were to focus on guiding the user towards using our templates, would that be enough, or would at a first glance would customers still have too high of a bar to reach for converting to our CI/CD product?
Notes from @ctimberlake1:
Hey folks, Not sure how much value this will add, But i'll go ahead and throw it out there. Because Professional Services deals with so many CI Conversions i've been invested in looking into a process to automatically convert them. As seen above in the linked to run Jenkins in GitLab.
The problem with this as you all know is that CI Tools and Syntax is all wildly different from vendor to vendor. I don't think we can ever have a 100% drop-in plug and play solution. But what i've been working on is a python application that will take a Vendor's CI File and go through it line by line to determine context and intent. Once we've determine context and intent, we then create a GitLab CI Job object for it.
In some cases, we can simply see a plugin declaration like..
withSonarQubeEnv() { // Will pick the global server connection you have configured sh './gradlew sonarqube' }
From this line in a Jenkinsfile, we can determine and understand that Jenkins wants to create a sonarqube environment and run said script tests. Which means we can create a GitLab CI Job with a sonarqube container and then execute said test command in the container to execute the tests.
This approach has a downside which is that we need to be intelligent about how we interpret the CI Files; understanding context for all Jenkinsfiles and plugins can be difficult. Let alone CircleCI or TravisCI. The goal is to make the python application extendable with plugins. For each CI File we want to convert we create an input plugin, the Python application scans and reads each line in a file, then passes the line to an input plugin to be understood.
Once sufficient context is determine as to what needs to happen. We then create a GitLab CI Job Object and add it to a list of GitLab CI Job Objects in memory. If we understand stage contexts (Like in Jenkins DSL), we can also keep a list of stages and attach the relevant stage to the GitLab CI Object.
When we've determined we've reached the end of the file, we close out the file and create a
converted-gitlab-ci.yml
file. We then dump a global job to this file, For things like queues and global image usage. We then dump the list of stages to the file. At the end we iterate over the GitLab CI Job Objects in memory and export them to theconverted-gitlab-ci.yml
file.A lot of this is still being worked on and developed. Between my time working with customers i've been working with this on the side. But this is the over-all architecture I had planned.
I've gone ahead and attached two graphs. One is a mock-up of how we could integrate this into the GitLab UI for customers, and a second showing an overview of the architecture I'm building.
Response from @mlindsay:
I've been thinking the same things as well @ctimberlake1 . My thinking after talking to a large enterprise customer, basically start from a template stand point. Pipeline should create an artifact that is a gitlab-ci.yml file that has a bunch of guidelines, based off a data dictionary of Jenkins modules in use in the Jenkinsfile.