Cannot reuse built code from a previous step because of how Maven and Git work
Summary
Before starting, I'm aware that the Bug template may not be the best one for the issue that I'm presenting but the other ones were not the best fit either. Let's say that I'm opening this to start a discussion on how to deal with the scenario we're in.
The issue here is related to how both Maven and Git work.
Here's a little bit of context:
-
Git doesn't store timestamps, this means that every file resulting from a
git clone
operation will have the current timestamp -
Maven checks files in the
target/source-classes
directory and compares them with files insrc
to understand if they need to be compiled again, and it uses the timestamp to determine if the source code is newer than the built one -
GitLab CI jobs do a
git clone
to retrieve the repository, meaning that all the files in the working dir have the current timestamp - GitLab CI jobs store folder artifacts as a zip file and those files are unzipped in following jobs, causing the artifacts to have as timestamp the moment where they were created in the previous job
If considering the Maven scenario, this means that files in target/source-classes
will look older than the files in src
, triggering a full build instead of using the already built classes.
Why does this matter? Let's say we have a pipeline where the first step is to build our project and then we have multiple steps that execute other goals like unit tests, code quality and so on, with the current behavior Maven will fully rebuild the project in every job of our pipeline.
From our point of view, this is a big issue because it prevents us to share build artifacts between jobs.
Steps to reproduce
- Create a random Maven project
- Add the following pipeline
build:
stage: build
image: maven:3.6-jdk-11
script:
- 'mvn test-compile'
except:
- tags
artifacts:
paths:
- target/
junit:
stage: unit_tests
script:
- 'mvn verify'
artifacts:
reports:
junit:
- target/surefire-reports/TEST-*.xml
Example Project
(If possible, please create an example project here on GitLab.com that exhibits the problematic behavior, and link to it here in the bug report)
(If you are using an older version of GitLab, this will also determine whether the bug is fixed in a more recent version)
What is the current bug behavior?
The junit
job recompiles the entire code.
This behavior can be observed by looking for the following string in the job's logs:
[INFO] Changes detected - recompiling the module!
What is the expected correct behavior?
The junit
job shouldn't recompile and should have the following line instead:
[INFO] Nothing to compile - all classes are up to date
Possible fixes
While this is not a GitLab-related fix, I'll describe the current workaround we're using.
The idea is that we can extract our artifacts with the -DD
options so that the extracted files will have the current timestamp instead of the one they had when the artifact was created, thus making the src
folder older.
To make this work, our example pipeline can be changed into this one:
.prepare-build-artifact: &prepare-build-artifact |
apt update && apt install -y zip
zip -r -q target.zip target/
.extract-build-artifact: &extract-build-artifact |
unzip -DD -q target.zip
build:
stage: build
image: maven:3.6-jdk-11
script:
- 'mvn test-compile'
- *prepare-build-artifact
except:
- tags
artifacts:
paths:
- target.zip
junit:
stage: unit_tests
script:
- *extract-build-artifact
- 'mvn verify'
artifacts:
reports:
junit:
- target/surefire-reports/TEST-*.xml
While this works correctly, it's clearly a workaround that's not supposed to be there for multiple reasons.
For example, it's difficult to find Docker images that include the zip
executable (maven:3.6-jdk-11
includes only unzip
) and this forces us to have the apt install
step which is not the best thing out there.
The alternative would be to build custom images including zip
, but it's not feasible in the long run (we build stuff in multiple languages and we use a lot of different Docker
images for our pipelines, we can't customize all of them just because zip
is missing).
Also, this workaround adds two more steps in the artifacts management and makes the pipelines more complex as we need to add the *extract-build-artifact
step in every job.
Finally, a possible GitLab-related solution could be to allow us to specify some artifacts extraction options (like the -DD
one in our workaround).