The use case is wanting to have a pipeline that compiles application code in one job, then packages the binary in a Docker image in a subsequent job, and the binary is not to be downloadable outside of the running pipeline. The cache feature is meant for dependencies. Using cache could prevent compiler output files (.class, .o, the final jar or exe, etc) from being created from the new code. The artifacts keyword passes the file between jobs, but also makes it downloadable by anyone with appropriate access to the project. The Jenkins stash feature sits between GitLab's cache and artifacts, in that it's for artifacts but without exposing them outside the pipeline.
The option to block artifact downloads. An extra keyword to make the artifacts pipeline-internal such that they're not downloadable and expire (are deleted) at the end of the pipeline.
Technical Proposal
Currently, we have the artifacts:public keyword to control whether artifacts can be accessible from non-members.
Here, we want to have another access setting. We can have a new attribute for artifacts: access; it can be used to replace the public keyword and handle this case.
Maybe this;
job:artifacts:access:none# public (default), member, none
public and access will not be used together.
- public will be deprecated/discouraged. We can't remove this now but maybe 1-2 major milestones later.
Artifact access will not be related to reports.
Out of scope
Deletion of these artifacts when the pipeline is complete.
This page may contain information related to upcoming products, features and functionality.
It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes.
Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.
Jackie Porterchanged title from Cache and Artifacts Options to Block Artifact Download (Equivalent to Jenkins Stash Keywords) to Cache and Artifacts Options to Block Artifact Download
changed title from Cache and Artifacts Options to Block Artifact Download (Equivalent to Jenkins Stash Keywords) to Cache and Artifacts Options to Block Artifact Download
@dhershkovitch@marknuzzo - this would be a really compelling migration feature for our CI ARR adoption track.
One of the limitations is our cache is good for dependencies but not elegant for persistence. it would be nicer to have space between artifacts and cache
Lastly, we should investigate if this is a straightforward enhancement or not
Thanks @dhershkovitch - For investigation, I will tentatively plan for the team to investigate once we have cleared a few other hurdles with other work in the next few milestones.
@dhershkovitch - I'm adding this as candidate16.8 for now. Though it would only give us 1/2 the quarter for this OKR, we could at least investigate so that we know what work needs to be done shortly thereafter. WDYT?
@shampton@jocelynjane - in talking with @furkanayhan, he confirmed that this issue falls under grouppipeline security due to the artifact references which align with our docs. I'm going to update the labeling but please feel free to provide your thoughts here.
@dhershkovitch I think this is a Pipeline Authoring feature and a competitive position with Jenkins in terms of syntax capabilities. While it has to do with artifacts, I am not entirely sure if we should defer this to Pipeline Security since the use case is about parity with Jenkins CI cc @marknuzzo@furkanayhan
@gitlab-com/pipeline-authoring-group/backend - can we get this issue weighted please for consideration in %16.9? If the proposal needs more clarity or some investigation needs to happen before you are comfortable with weighting, please ping @dhershkovitch and I here. Thank you.
Thanks @furkanayhan - I like this approach of having an additional keyword (member) here. So the thinking here is that if someone wants to block artifacts from being downloaded, they just need to say:
I'm the original customer that @gavinpeltz opened this issue for. I missed some of the discussion over the holidays so sorry if this feedback is a bit late.
I like the proposed access keyword, but I'm not sure that it meets the request of having an analog to Jenkins' stash, unless we can have multiple artifacts on a single job. My reasoning is that I may have need to specify both actual downloadable artifacts and inter-job/intra-pipeline files created by the job. . Perhaps extending cache with an option that makes that particular cache key intra-pipeline instead of inter-pipeline makes sense, given we can already create multiple caches. I could then use that to carry debuggable compiled .jar/.o/.so/etc files between jobs knowing that they won't be persisted to the next pipeline, ensuring that they are compiled from the checked out source every pipeline.
So the thinking here is that if someone wants to block artifacts from being downloaded, they just need to say:
job:artifacts:access:member
@marknuzzo they should do member to block from public access, which they can do currently with the artifacts:public keyword. Here, we are adding none to block from all, including members.
I may have need to specify both actual downloadable artifacts and inter-job/intra-pipeline files created by the job.
@jghal I see your point. So, you want to control the access of each artifacts from a job, right?
Perhaps extending cache with an option that makes that particular cache key intra-pipeline instead of inter-pipeline makes sense
Technically, you can, however, this is not the main goal of cache. If you want to pass a file/output from a job to the next job, you should use artifacts.
So, you want to control the access of each artifacts from a job, right?
Right. Although continuing to think on it, we probably wouldn't often have a single job that is publishing artifacts both for inter-job lifetime and as long-term post-pipeline downloadable artifacts, as we use JFrog Artifactory for sharing the final artifacts outside of GitLab. So if a team did have a use case to user artifacts for exposing artifacts outside of GitLab after the pipeline, they could potentially use a job at the end of the pipeline that gets them from a prior job which published them as temporary inter-job artifacts and just republish them with different artifacts settings.
Technically, you can, however, this is not the main goal of cache
I agree that the distinction between downloading versioned dependencies (cache) and making job output available to later jobs (artifacts) warrants correct use the respective keywords.
With an access: none setting, we would of course still want that the data from any reports artifacts is still presented in the UI. Does expire_in affect how long those reports are rendered in various UIs (MRs, pipelines)?
I think it would address the want to keep artifacts as internal to the pipeline. I'm thinking about how report artifacts will be impacted, though. So if I set those jobs to access: none do I have to have another job to publish those junit reports?
Does this align with our long-term goals with artifacts
No, this does not. I have a concerns about what is being asked and how it aligns with the broader experience when it comes to artifact management (and things like restricting downloads) and I have a solution validation issue created to research this further. We get a lot of requests for restriction with different use cases; I do not want to introduce too many settings as that can be difficult to manage. I'm open to prioritizing this research now that we have a designer on our team.
One a related note, we have had issues requesting download restriction, but the users can still browse/see the artifacts. Perhaps I don't understand the security concern here, but if I can view the artifacts, what stops me from re-creating it (basically workaround the download restriction).
I disagree with this part of the proposal:
public will be deprecated/discouraged. We can't remove this now but maybe 1-2 major milestones later.
This is a breaking change which grouppipeline security is responsible for managing and I see no reason for removing/changing it. This is not a trivial change and we have to provide support for the removal.
Hi @jocelynjane. For my part, it's less a security concern and more that artifacts is a singular keyword that serves what I perceive as two distinct purposes: sharing temporary files between jobs, and persisting build results for retrieval after the pipeline is completed. If I use artifacts on my jar files in a build stage job and then run my integration tests using those jar artifacts in a later stage job and the tests fail, then I don't want to publish those jar files for anyone to view and download (anywhere, not in GitLab, not in Artifactory). I want them discarded and forgotten. But they're already published and downloadable because that's what artifacts does. We just want a way to share files between jobs, without them yet becoming published artifacts that users can view and download. We want to control that publication for consumption separate from passing them between [containerized] jobs.
Jenkins has the stash and unstash functions which syntactically are more like cache than artifacts, with the exception that the relevant files are only persisted during a singular run of the pipeline while cache persists those files between pipelines. This persistence works fine for downloading versioned dependencies, but not so much for jar/so/dll/exe files my compilers are generating.
@jghal thanks for your response. I understand your use case, and also the parity with Jenkins being a high priority. My ideal preference is we would have the time to solution this out across all of Build Artifact management, but this is not a likely scenario given timelines and capacity all around.
If the best solution right now is to add keywords to artifacts, I can accept that. I do not, however, support the deprecation of what we have today. I have a hard time seeing why the deprecation is necessary. It is a non-trivial lift.
This is a breaking change which grouppipeline security is responsible for managing and I see no reason for removing/changing it. This is not a trivial change and we have to provide support for the removal.
I do not, however, support the deprecation of what we have today. I have a hard time seeing why the deprecation is necessary.
@jocelynjane In the proposal, I am not saying that we should directly remove the public keyword. I know it's pretty difficult to do this because I assume it's highly used at the moment. That's why I mentioned "maybe 1-2 major milestones later" :)
If our goal is to provide better/improved access options for artifacts, deprecating public seems inevitable to me.
@jocelynjane@furkanayhan Thanks for the conversation and proposal here. I have two things that I want to provide my opinion on.
How this proposal solves the problem
While I understand the proposal for adding artifacts:access has the benefit of using the concept of artifact access permissions to solve the blocking artifact download, it doesn't seem to be the goal of what was initially proposed.
From what I understand of the original proposal is that there are some artifacts that simply don't need to be stored after a pipeline is finished. The initial idea was that would have something like artifacts:download or maybe even artifacts:stash with a default value of true which would keep current functionality, and if that keyword was false then we would just delete the artifact once the pipeline finished. (It doesn't have to be one of those keywords)
The artifacts:access keyword doesn't seem like it would solve any storage problems, but only block artifacts from being viewed/downloaded. From what I can tell, the initial request was focused more on storage optimizations than security concerns.
How this proposal fits in line with future artifacts vision
While I disagree with artifacts:access being used for this issue, I think it's an interesting take on how we can iterate on the artifacts:public keyword. As pointed out, we have a solution validation issue for restricting artifact access even further. If we determine that we do want to allow further restriction, then changing artifacts:public to artifacts:access would make sense and allow us to add more arguments for restricting access in specific ways.
@shampton thanks for sharing your thoughts here. Per this comment I think you're right - I want them discarded and forgotten - the proposal is missing the storage piece of the equation.
@furkanayhan Regarding the future vision for Build Artifacts, @bonnie-tsang will start looking into the the validation issue in the next few milestones. At this point in time, I think the focus should be on solving the problem, and not the future of Build Artifacts. I want to avoid a situation where we go in one direction without having done our research then have to introduce another change once we have a solid solution that aligns with the broader vision.
the initial request was focused more on storage optimizations than security concerns.
If that's the case, why does using a shorter expire_in value not help in this case?
we would just delete the artifact once the pipeline finished.
This would raise many more options. For example; what does "the pipeline finished" mean? Successfully finished?, Success with warnings?, Failed? I am asking this because of the "retry a job" possibility. My main point here is actually "retrying a job after the pipeline is completed". If we delete artifacts right after "a pipeline is finished", some jobs may not run correctly after individual job retries. We may already see this if we use a shorter expire_in value.
Of course, this does not mean that we can't have this feature. We may provide this option;
job:artifacts:expire_in:pipeline_completed# What does this mean? success? failed? success with warnings?# blocked because of manual jobs?
@furkanayhan you bring up good points about the implications of expiring artifacts when a pipeline is finished. I like the idea of adding a new option (or options) to the expire_in keyword to allow users to customize this behavior.
@jocelynjane@jreporter@dhershkovitch I don't think this is as simple as anticipated, and could benefit from some user research to determine the best path forward. WDYT?
If that's the case, why does using a shorter expire_in value not help in this case?
It does help, that's the workaround we're using for now. But I can only have one artifacts per job and one expire_in for all artifacts in that job. So if I want to have a single job with both some intra-pipeline files passed to subsequent jobs in the same pipeline and some persistent artifacts I can't do it. I can only have one or the other. I can have a second job in a later stage to redundantly publish the persistent artifacts with a different expire_in value but that's still just more workarounds for the original lack of granularity.
If we delete artifacts right after "a pipeline is finished", some jobs may not run correctly after individual job retries.
In my use case, this would be my test stage job failed so I don't want to publish the build stage job's output (ie the jar files or similar). If the failure was bad code, I would need to commit new code anyway. If the failure was environmental, then yeah I'd be stuck without those files to retry the test job. But if I'm regularly seeing environmental failures in my tests that I can fix by rerunning that job, I need to pause and stabilize my environment (perhaps to the tune of automating the setup and teardown of that environment within the pipeline to make it more deterministic). And I would only be in the situation (of not being able to retry the job due to expired artifacts) because I intentionally coded my pipeline to do that. And I could set the expire_in value with a CI variable (and use pre-filled configuration to restrict the possible values) so that people could manually trigger and override that variable so they have a pipeline where they can retry that failing job until they resolve the issue and then the rest of the pipelines continue on with the coded defaults.
@jocelynjane@jreporter@dhershkovitch I don't think this is as simple as anticipated, and could benefit from some user research to determine the best path forward. WDYT?
@shampton I have 3-4 customers looking to migrate who are interested in us supporting the Jenkins stash use case. That is enough justification to support the problem. I think we should do further solution validation to make sure we are supporting the right use case for Jenkins migration into GitLab cc @dhershkovitch@jocelynjane
@dhershkovitch - please document the stash usecase, work with Gavin and his customer (I can find the other customer notes and it looks like @jghal is also willing to provide a use case, to ensure we have the right solution.
To support users migrating from Jenkins we should validate the following solution (which is almost identical to what Jenkins offers)
Add an optional stash keyword
The stash keyword will stash the file (temporary shelves the artifact), so users or jobs wont have an access to it
we could use the following syntax proposal:
We could continue iterating (if needed) and introduce an unstash job which will unshelve the artifact to be used by the job and available for download (requires additional validation)
Unlike expire_in, the stash keywords gives as an option to introduce unstash
Supporting multiple artifacts per job effort should be orthogonal to this effort, once we'll implement that capability we could support stash (and unstash)
@jocelynjane let's collaborate on a quick solution validation to see if we could start with iterating on this requests by introducing stash (and later unstash if needed), once we'll have a clear solution we could discuss the right group assignees
@dhershkovitch I think the dependencies keyword takes care of the unstash use case. My main issue with extending artifacts is the multiple artifacts per job issue you linked. What would be the impact of proposed stash modifier on reports artifacts? I would hope for it to not apply to them at all. I think report artifacts should serve as compliance/audit records (with the same persistence/lifetime as any MR they are displayed in, regardless of expire_in).
@dhershkovitch per my discussion with @jghal (Dovetail - internal only)I wonder if we should consider a different term to define this set of temporary files that will be used for downstream jobs then deleted automatically versus bundling/overloading artifacts (Note - these temp files should also be hidden from the UI, meaning a user should not be able to browse them in the Artifacts page). The problem we have right now is artifacts applies to a lot of files, but the desire to use stash only applies to a subset of the artifacts which are created by the job.
@shampton I don't think we have a good way to break out artifact types right now, curious what your thoughts are here. If we could somehow break out the artifacts, we could theoretically have a "temporary artifacts" for this purpose.
I don't think we have a good way to break out artifact types right now, curious what your thoughts are here. If we could somehow break out the artifacts, we could theoretically have a "temporary artifacts" for this purpose.
@jocelynjane No we don't have a good way right now, we'd have to implement #18744 first.
@dhershkovitch I think being able to limit or specify which artifacts are "included" in the stash would be great - I'm not sure though how we would do this without implementing #18744.
@jocelynjane Can we validate how many artifacts requires stashing? if the number is low then we could implement it rather quickly? Also what if we'll roll it out for Self managed only?
The number of paths we want to stash in the YAML may be small, but the number of actual files could be large. For example we would want to stash the entire .target folder in a pipeline doing Maven.
@dhershkovitch I'm not sure how rolling this out only for self-managed is an advantage over both self-managed and .com. Can you please help me understand the thinking here?
@jghal following up from our conversation, I wanted to confirm this issue is about being able to hide (for lack of better description) temporary files which are used only for the downstream jobs and not a universal "block artifact download". Have we captured the problem correctly?
@jghal I have created a separate issue for the specific "temp files" use case (#440852). When we go through our problem and solution validation for Build Artifacts at a later date, I want to make sure we don't lose track of this specific problem.
@jghal
Since we wont be able to implement multiple artifacts per job, but we do want to provide a solution promptly do you believe the existing issue description along with a shorter expire_in value could be a solution to your requirement?
For us specifically, all our projects are private anyway, so presumably that would default us to access: member which we could further restrict with access: none. So yeah, I think that would be sufficient.
Thanks for sharing this video @rkadam3 - Quick question: When we consider someone a non-member, in the video, you were showing as someone not logged in. But if someone was logged in to gitlab.com but still not a member of that project, would they still see the same behavior of no download arrow for artifact download?
@marknuzzo - Yes, see below screenshot. I am logged in other browser window as meggan. The project that root user created is public, and meggan cannot see the download arrow in the pipeline.
And meggan is not a member of the project too.
In fact, if access is none even the members cannot see the download option.
@jocelynjane I'm getting feedback from my dedicated customer who is on 17.0.3 that there's been a regression for this, it was working in 17.0.2.
When the artifacts:access keyword is set to none, the subsequent pipeline can no longer access the artifacts passed to it. They are utilizing it to prevent downloading of reports:dotenv within the UI but expecting it to continue to work within the CI jobs.
Can we validate this? I'll get the customer to open a support ticket and link it here subsequently.
@manuelgrabowski - would you be able to help with reproducing this behavior that the customer describes? I think any additional details we can gather here will help as we further investigate here.
@marknuzzo@jocelynjane As far as I understand it this is the known issue that Raul shared above. The regression was introduced in 17.0.3, the fix is already deployed on GitLab.com, but didn't make the cut for 17.0.4. It will be in the next security release on 2024-07-24. See this thread. Let me know in case I misunderstood, happy to dig deeper in that case.
Thanks @manuelgrabowski - so it sounds like once the next security release is deployed on 2024-07-24, this problem should resolve itself once the customer upgrades to the latest version. Is that correct?
@marknuzzo Correct – until then, they could set access: 'all' as a workaround – but that is not something that is possible/acceptable to all users/environments of course, that's why it was treated as severity2 / priority2 and fixed asap.