The antivirus verification is currently creating a child pipeline. Each job in the child pipeline scans a single image, if any virus is detected, the job will fail. In case a job in the child pipeline fails, the parent pipeline fails too. In case there is more than 50 images to be scanned, there will be multiple child pipeline launched (not in parallel), this is due to GitLab restrictions where you cannot have more than 50 jobs in a pipeline. The current configuration makes this AV runs every time a pipeline is triggered.
Last but not least, this scanning job is long, currently it is doing the scan of every image of the hub (21 images) in 20 minutes, and the test with 80 images lasted 50 minutes. A child pipeline running makes the parent pipeline in waiting state, so the parent pipeline will be stuck as long as the child pipeline isn't finished and that's a big issue on this anti-virus.
Concretely, MkDocs is only ran when the anti-virus and its child pipeline is done.
Duration problem
This duration causes multiple problems:
Mkdocs job is only ran when child pipeline is done
The pipeline can get stucked for more than one hour
People in their MR are expecting their pipeline to be finished fast (e.g.: user wants to see his job documentation artifact)
It consumes minutes
The antivirus is running every time the pipeline is launched
Current aim of the job
There are several targets in order to have something convenient and functional:
We don't want this job running every time the pipeline run
We don't want the whole pipeline to be stuck, waiting for the AV
AV database is updated one or two times a day, so we want to know quickly if an image has a virus or not
In case of contribution & MR, we want to be able quickly to know if the new job's image has a virus or not.
Solutions
There are several solutions to solve most of these problems, we will discuss those at the weekly technical meet of 17/03/2021
The following is an explanation of a cache solution to run the job only if a file exists in the artifact (it is a part of the solution):
Using an artifact with a single file and an expiry date of X (to be defined, like 1 day), we can tell if the AV needs to be running or not (run AV if artifact file does not exist). By doing this, every time the AV is running, it must create an artifact. Following this issue, we can do that using rules.
Scheduled pipelines
This first solution is to use the current scheduled pipelines, and with rules. So for each scheduled pipelines the AV job will be launch (in a limit of X per day, following the above explain about cache).
By doing this solution, we don't make people wait for their pipeline in MR or in a merge. The drawback of this solution is that an image coming from an MR isn't scanned, as the pipeline would be running on master.
To solve this problem, I think we could implement another rules to make the AV job running at least one time on an MR. This pipeline would be needed to be successful in order to be merged.
Triggers
The problem of the above solution is that we have to manually check if the new image is safe. The good point is that we are all emailed every time a pipeline fails, so we know when there is a problem!
But I have to work about using cache for each job, so we only scan jobs that aren't scanned in the last 24 hours. On top of that, we should have a 2 step process: one first check on one MR which is merged on a separated branch. The other check to do is after our pipeline is ran with anti virus job.