In projects where the CI config overrides the definition of sast and forces the rules, the legacy Docker-in-Docker orchestrator is triggered even though SAST_DIND_DISABLED is not set (default value is now true). The pipeline ends up having the legacy DinD orchestrator and the new scanning jobs running in parallel. Also, the rules are not applied to the new scanning jobs. Same goes with Dependency Scanning.
When SAST and Dependency Scanning (DS) used to run in a single job, users could easily override the job definition of sast and dependency_scanning, and change the conditions on which the scanning jobs are triggered. With the removal of the Docker-in-Docker (DinD) orchestrator and non-DinD becoming the default, users now have to override multiple scanning jobs. See #218444
Bug information (click to expand)
Steps to reproduce
create a project w/ a CI config that includes the SAST template
in the CI config, override the job definition of sast and disable the job using rules
Same with the Dependency Scanning template and the dependency_scanning job.
Add another sub-section to the Troubleshooting section of the Application Security docs to explain that for SAST and Dependency Scanning, there are old jobs which remain for backwards compatibility reasons, and if the sast or dependency_scanning jobs are overridden with custom rules, then this will result in a legacy sast or dependency_scanning job being run, which will fail with the following text in the CI job log:
sast is used for configuration only, and its script should not be executed
We need to emphasize that these jobs must not run, so they can be used to configure a lot of things that apply to all "real" jobs, but not the rules property (as we need that one to stay with never value).
@NicoleSchwartz@tmccaslin To me this is bug regression but not necessarily something we should fix. I've created an issue anyway, hoping this will help users make sense of what's going on. I'd like to suggest ~S4 ~P4 for this one. Ultimately #218444 captures the problem we're trying to solve. cc @theoretick
This seems like something that could best be handled by a documentation note. In general given DinD is now non-default, we shouldn't spend effort fixing this outside of documenting it.
They overlap but I'd argue that this is a regression where pre-13.0 behavior is no longer compatible and it's not intuitive to a user that they must change things.
#218444 is more of a feature improvement that would address this, but may not result in an outcome that fixes this exact usecase. As @fcatteau mentioned above, I'm mixed on whether we should actually fix this rather than say "you must now configure your jobs differently"
It worked fine with the SAST DinD orchestrator and its single sast job, but with no-DinD becoming the default we now have multiple *-sast jobs (expected) and the legacy sast job (not expected). See https://gitlab.com/gitlab-org/release-tools/pipelines/147562588. The sast job shows up only because the CI config of this particular projects overrides default rules.
See the definition of the sast job in the SAST template:
The problem is, the condition on SAST_DISABLE_DIND has been overridden, because the project-specific rules are not merged with the ones that come with the SAST template.
Thank you @theoretick for answering @twoodham's question, which is not what I did. And yes, this issue describes a ~bug reported by other GitLab teams (and to me this is a regression) whereas #218444 is a feature proposal that would solve this.
Okay, thank you both. Just to make sure I didn't miss anything, I'm hearing y'all say:
This issue describes an unintended behavior which is an outcome of making DinD off by default as well as switching to a rules syntax.
#218444 describes a potential long-term solution to making the new vendored templates more extendable, therefor provides a means to offer a long-term fix to this and related problems.
What tripped me up was that we have a bug and a feature proposal to address unintended side-effects of the %13.0 breaking changes. Assuming I have internalized these two correctly, I'll get out of the way. Thanks for taking the time explain.
. I think part of the confusion is us using regression to refer to an unanticipated consequence of a breaking-change. It was a breaking change so technically I don't think we need to honor the previous configuration paradigm, which is what I think we've alluded to with "not sure we need to fix this" but it still feels a bit wrong
overriding the rules of the sast job is NOT documented
@fcatteau should the task here be to explain how to override the rules of the sast job? Or should we just be documenting that if you happen to override the rules for the sast job, the pipeline will execute the legacy sast job, and that this will eventually be fixed by Improve extendability of SAST, Dependency Scanning template rules?
That being, this is a ~bug issue right now, and it looks like the bug has been fixed long ago when dropping support for DinD, in #220540 (closed). It now explicitly fails when users attempt to enable dependency_scanning job or the sast job, by forcing the job rules. See MR and DS CI template - same applies to SAST.
I believe we should close this issue. It's been done already.
@fcatteau@adamcohen i think this possibly warrants a "faq / troubleshooting" documentation update for those who bump into it, but i don't think we want to fix it as we did say it was a breaking change and we're past the backport required window as there is a work around. Would this be a simple "troubleshooting" item? (i.e. can we just slap #218541 (closed) into docs?)
i think this possibly warrants a "faq / troubleshooting" documentation update for those who bump into it
@NicoleSchwartz I was under the impression that this would be a documentation change like you suggested, but it seems like @fcatteau is suggesting that this isn't necessary and that we can just close the issue? @fcatteau any reason why you don't think this needs to be documented?
@adamcohen Yes, I think there's no need to document anything since dependency_scanning and sast jobs now throw an error message when enabled (by changing their rules). In the case of Dependency Scanning, the error message is:
dependency_scanning is used for configuration only, and its script should not be executed
@fcatteau - I agree that we should add this to the Troubleshooting section. I've added appropriate docs labels so that this remains on my "to do" list.
@NicoleSchwartz it looks like we just added groupcomposition analysis to this issue so this was not on my radar until now. As such, I'm resetting the missed labels.
I'm fine taking this as a Stretch item in %13.7 to address it ASAP.
If I want to not have a "test" stage in my .gitlab-ci.yml at all... your documentation doesn't work :( I still need to override the dependency_scanning job to point at my new stage. Let me explain...
This does not work!! Why? Because it fails to lint- basically dependency_scanning job needs a stage "test" to run. So I either have to add in a stage that is never actually used to satisfy the linter... or I need to still override dependency_scanning job and specify my new stage.
This wasn't a problem in the past because we were just overriding "dependency_scanning" to define the stage for the dependency scanners.
So... it's a strange nuance (we like having more specific stages then just "test") but while you're writing in the troubleshooting FAQ... may be good to get ahead of this one.
I don't quite follow your example. If you override the dependency_scanning stage like you have above, that does result in a valid CI configuration. At that point, you can remove the individual overrides per job and from a quick test on https://gitlab.com/gitlab-org/gitlab/-/ci/lint I'm seeing a valid config.
This wasn't a problem in the past because we were just overriding "dependency_scanning" to define the stage for the dependency scanners.
My apologies if I'm missing something but as far as I'm aware, this has not changed and should work the same as any past configuration. In your example you have the dependency_scanning override commented out, but if that's uncommented, it appears valid:
@theoretick Correct! IF you comment out the "dependency_scanning: stage: unit-tests" then it works. If you don't, it's invalid CI configuration syntax.
So let's go back to why this bug is happening- people (including my company with 1000's of developers and hundreds of development teams) used to call dependency_scanning and the sast job rather then each of the individual scanners. This is great because most of our developers don't know what scanners are going to automatically start, but they can get started with sast or dependency scanning with boiler plate code... no need to:
insert a test stage
import sast and have it run to see what triggers
hard code in the language-specific scanners that run so that they can set rules as they'd like
But... let's say they go through that anyway based on the current state of things. This bug is caused because we're no longer supposed to be overriding dependency_scanning and documentation is being updated to reflect that. You're supposed to override individual specific jobs. But... if you don't want to use a "test" stage, you still HAVE to override dependency_scanning before you then override the language-specific job. Because without a test stage, dependency_scanning fails to lint.
But wait.. the proposed fix says "documentation to explain that if the sast job is overridden with custom rules, then this will result in the the legacy sast job being run, which will fail with the following text in the CI job log:".
So my entire point is that to get sast or dependency_scanning to work without a test stage, I have to override sast or dependency_scanning with the new stage. If I then apply rules to it, it will fail. I have to apply rules to the specific job and not to the sast/dependency_scanning job to make it work. The proposal here is neither intuitive nor clear that I can still override the sast/dependency_job to not use a test stage, but not apply a rule. I would then still have the rule trigger from overriding the language_specific job. The error message job says nothing about the fact that rules are the culprit here, and the docs don't indicate that it's still okay to override sast/dependency_scanning for things... just not with rules.
@gonzoyumo@twoodham it sounds like per the OP in this thread we have a linting issue - is this the gitlab linter - can we fix it in some way? it complicates our desire to document a fix here - who would we need to cordinate with?
as we have decided to remove DinD for more flexibility in what envs we can run as well as follow better security practices we obv aren't planning to change back, however if you have to do a goofy series of things to "fix" it i'd rather us fix the linting if it makes it easier to follow
Looks like you found the right group above- I was indeed using the built in GitLab CI Linter. Let me know if there's anything else I can do to help. Anxious for this change as we keep running into it at our company.
@doug_rickert_here - if I may ask, is the scenario you outline above something you've tried on a self-managed instance of GitLab, gitlab.com, or both? I'm trying to make sure we're all looking at the same version, hence the question.
@doug_rickert_here Thanks for letting us know about this and sorry to hear that you're having problems with the CI linter. Do you have the specific error that you're seeing? If you're also able to share a public project (with any private information redacted) with us on GitLab.com as you mentioned that you can reproduce on 13.8.0-pre on GitLab.com, that will help us to debug the exact problem you're seeing. (Apologies if I've missed this information in this discussion thread)
@cheryl.li the issue is detailed above "This does not work!! Why? Because it fails to lint- basically dependency_scanning job needs a stage "test" to run. So I either have to add in a stage that is never actually used to satisfy the linter... or I need to still override dependency_scanning job and specify my new stage." #218541 (comment 469483426)
Basically due to the complexity of the pipeline yml files it is demanding a user have test stages, that they don't need to
How can we work to improve the logic of the linter such that it doesn't trap the users in a chicken-and-the-egg situation
@furkanayhan i guess that is exactly where my confusion is
error This GitLab CI configuration is invalid: dependency_scanning job: chosen stage does not exist; available stages are .pre, unit-tests, .post
BUT it's set to unit-tests which per the error above is supposed to exist?
I guess if you look furhur up - customer as expected overrides "dependency_scanning: stage: unit-tests" but then linting fails, and then if they Stop overriding the "dependency_scanning: stage: unit-tests" then it works.
Customers should be able to override the stages? linting shouldn't stop them?
I guess if you look furhur up - customer as expected overrides "dependency_scanning: stage: unit-tests" but then linting fails, and then if they Stop overriding the "dependency_scanning: stage: unit-tests" then it works.
@doug_rickert_here, from all above comments and explanations, it looks like you're definitely in an edge situation that is not something we've accounted for when going through these various updates of our job definitions over time. Let me state a few things to validate I have a good understanding, and then work toward a suitable solution:
all our jobs (previous and current) rely on the test stage
the dependency_scanning job is no longer used as an executed job (and would fail if executed), but kept for backward compatibility to customize configuration
#2 allowed customers who were used to define some settings in that old single job, to be transparently inherited by the new specific jobs. Though, this is where confusion starts as this is not true for all kind of settings (e.g. rules).
As a result, the combination of how our CI works and how we moved away from the old dependency_scanning single job are conflicting in the situation where the test stage is not used.
Though, as pointed above (here and here) this works fine as long as you're not trying to override the rules of the dependency_scanning job, which is supposed to stay with:
rules:-when:never
as we do not want this job to run.
Could you please explain us why you need to change the rules of this dependency_scanning job?
Yeah I think you nailed it- the bummer is that we're seeing the custom rules like this in a lot of our application teams.
What we see in the code is teams are trying to put in toggle/optimization capabilities into their pipelines so that they can quickly turn CICD jobs out without having to edit their .gitlab-ci.yml file. Take for example, this block of code in one of our teams:
It all makes sense to me... the optimizations make sure they get Dependency Scanning during code changes, merge requests, periodic builds, etc... but they don't always have it run as they're pushing changes to a development branch. This happens with SAST too because SAST can take some time to run for a lot of our projects, so teams have been optimizing when it runs. At the end of the day, I'm happy as long as it's running on their master branch and in merge requests because that's where I believe 90% of the benefit comes from.
I can pull more examples if you want but they're all around teams trying to optimize when certain jobs run as they balance value of the tools with CICD optimization
Again- I'm not trying to ask you guys to change behavior/architecture. We just were bit at our company because we had a lot of teams that had build these rules onto their dependency_scanning job. Then, when this change got pushed the error message they started getting was very unintuitive/confusing which resulted in tickets for our team.
Documentation (FAQ/Troubleshooting section probably) and potentially a better error message could go a long way. I'd love to not have to override the "dependency_scanning" job with a new stage when I'm already overriding the individual job with a new stage (I don't want people thinking about the dependency_scanning job at all since that can lead to them writing rules/tweaking that job), but I'd be happy enough with documentation/clearer error message
I agree with your statement about the lack of clarity of errors/troubleshooting doc when going through the update of your template. To follow with our latest template changes, you had to go through more hiccups than we anticipated. Using something else than the default test stage is definitely not on our testing suite. @NicoleSchwartz, I am not sure how much we want to consider it in the future, but this is a gap we have.
I also don't think we need to change anything in our CI lint, the error message is accurate to me. The complexity to address the issue comes from how we (Secure stage) have organized our templates.
So @doug_rickert_here I understand your pain is only due to our transition state and you would actually benefit from us totally removing the old dependency_scanning job.
@NicoleSchwartz as a result here are a few actions we can take:
better document the necessary changes to apply to Security features when overriding the stage property (applies to both Dependency Scanning and SAST at least, TBD for others). This is in addition to the current possible fixes suggestion.
open an issue to consider removing the old single job dependency_scanning (same for sast)
add to our test GAP analysis: using a different stage than the default test one.
@doug_rickert_here your opininon is more than welcome on the above actions, I hope this is addressing your needs. You might also keep an eye on #218444 as this might help your teams to better organize their optimizations.
As mentionned in #221003 (comment 421716024), we might improve sub-pages troubleshooting section to point back at the main one to avoid content duplication.
I also don't think we need to change anything in our CI lint, the error message is accurate to me. The complexity to address the issue comes from how we (Secure stage) have organized our templates.
and the proposal
Is there a way to modify the linting so this is a more simple fix if one follows our proposed documentation?
@gonzoyumo should this issue involve changing the linting or not? I'm assuming the answer is no, but just wanted to confirm.
better document the necessary changes to apply to Security features when overriding the stage property (applies to both Dependency Scanning and SAST at least, TBD for others). This is in addition to the current possible fixes suggestion.
From what I understand, this issue will now involve the following:
Add another sub-section to the Troubleshooting section of the Application Security docs to explain about the sast is used for configuration only, and its script should not be executed error that can be encountered by overriding the sast or dependency_scanning jobs.
@gonzoyumo should this issue involve changing the linting or not? I'm assuming the answer is no, but just wanted to confirm.
@adamcohen your assumption is correct, we don't change the linting here.
Change the Getting error message sast job: stage parameter should be... error in the main application security page to make it more generic so it applies to both SAST and Dependency Scanning, as well as note that this error is applicable to both of these areas
=> Yes, we need to make it generic. But this not only applies to SAST and DS, it is the same for all our security jobs. And to clarify this we might need a proper section to explain what to do when using another stage than the default one (this is currently nested in the troubleshooting section). Though, when using another stage, things are getting worse for SAST and DS, as they have old jobs that get in our way. The sast and dependency_scanning jobs, while no longer used as executed jobs, are still present in the template (for backward compatibility and config inheritance), and they set the stage property. So we have to override them additionally to the "real" jobs like eslint-sast and gemnasium-dependency_scanning for instance. This is what confuses users, because they have to override the stage in these old jobs, but the rules have to be overriden in the real jobs only...
Add another sub-section to the Troubleshooting section of the Application Security docs to explain about the sast is used for configuration only, and its script should not be executed error that can be encountered by overriding the sast or dependency_scanning jobs.
=> Yes. We need to emphasize that these jobs must not run, so they can be used to configure a lot of things that apply to all "real" jobs, but not the rules property (as we need that one to stay with never value).
@adamcohen I hope this makes things clearer, otherwise let's chat about it in our next 1:1
yes, @adamcohen was close to complete the refinement at the end of that thread #218541 (comment 531011080) but this was superseeded by DB rapid action. I'm pushing this to %14.1
@adamcohen please try to complete the refinement when you have a chance.
@gonzoyumo I've updated the Possible Fixes section and changed the weight to 2 because I have a feeling there will be a lot of discussion on this to get the documentation correct. Please let me know if I've missed any details in the proposal, thanks!
Based on the previous discussions, this is a documentation issue that requires good knowledge of the Dependency Scanning and SAST CI templates, but there's no backend work involved.
Fabien Catteauchanged the descriptionCompare with previous version
@adamcohen I've renamed Possible fixes to Implementation plan, and added a line to make it clear that it's all about updating user documentation. I hope this makes sense to you.