Right now the Jobs API is pretty limited in the queries it will accept. For example if I want to fetch project jobs I can filter by scope but that's about it. I work on a project where we create hundreds of jobs every day. I want to fetch all artifacts created in the past month from all jobs that have a certain name. Right now I have to download thousands of records to find the id's of a handful that match the name I'm looking for.
Proposal
Add a name attribute to the query string for fetching project jobs so I could do:
@jlenny What do you think about this one please? I'm seeing a need for this to find the status of certain manual jobs within a project that are not executed for every pipeline.
This would be a really nice feature. There are a lot of times I want to grab one job out of a pipeline in order to start/replay it with the API. Looks like this is where the change would occur:
This seems to have changed since the time @brucehubbard wrote this comment. Here's what it looks like right now (September 2021). I changed master to 848932da in the links below to make them as permanent as possible:
My rough estimate (not having done any code contributions to GitLab before) would be that this is something like 4-8h of work for an experienced developer with no prior GitLab development experience, adding the parameter and testing it locally. (I presume that some unit/integration tests should be added as well.)
Noting this down in the hope that someone else is eager enough to pick it up. I would really like to see this feature being added; the current lack of ability to filter CI jobs is quite limiting to me.
@jlenny So i wanted to know, should we filter only based or name or could we add some other parameters as well.
Like based on user, stage etc. or should we confine it to name only as of now?
We like to do as small iterations as possible, so I think following the issue description and doing name first makes sense. We have demonstrated use case here, and we can follow up on the others when that becomes the case.
Though this could be considered a separate issue, it would be nice to also allow filtering on started_at or finished_at. Currently if I want to get info about jobs that have run in the past week, I need to get data about all jobs from the server and then filter them on the client side. Right?
I desperately need this feature, is there a workaround ?
We are using an orchestrator xl release that has an approval flow before doing deployments, however i don't have an easy way to trigger the manual pipeline. So I currently have to parse the jobs from the pipeline and select it, its not a major issue but it's a bit painful that i can't get a job by name and trigger it, only by ID.
Hello I would like to know if its possible to get a job by name in a pipeline, we have many manual deployments but our deployment orchestrator is an external tool.
Why interested: Wants to see the last time a specific job name succeeded. Some of their repos have dozens of job names so this becomes difficult. Also, looking at #19872.
Thanks @cbazan1. Here are my current notes on how I believe this could be implemented, with a rough guesstimate on the amount of work required: #22027 (comment 668361001)
@tnir - I think we need to get @marknuzzo and @samdbeckham to weigh in here as well on what kind of parameters we want this end point to accept and how the response looks if it differs at all from the existing standards.
what kind of parameters we want this end point to accept and how the response looks if it differs at all from the existing standards.
@allison.browne - by adding a name parameter for job name to the Jobs API, is there anything else that we should be mindful of that would impact this endpoint? Should ref be considered too based on below? Would the response remain the same but just be a filtered list based on those attributes?
@marknuzzo, It's not trivial from a performance perspective at gitlab.com scale since the query takes a long time (partitioning would likely help, if we only allow reads on un-archived builds when searching by name).
@mbobin, I think we still can't make changes that increase the total size of indices on ci_builds?
We have an index on (project_id, name, ref) but it won't be used by the queries because of the condition on success and not retried.
"index_ci_builds_on_project_id_and_name_and_ref" btree (project_id, name, ref) WHERE type::text = 'Ci::Build'::text AND status::text = 'success'::text AND (retried = false OR retried IS NULL)
I think we still can't make changes that increase the total size of indices on ci_builds?
Yes, that's correct. I think we would need to add more than one index because the finder starts filtering from different relations: project, pipeline, runner. And I assume we would need to use gin indices because we might want to support partial matches and those should work better for text search.
But this endpoint is already behind performance wise because there are a lot of bots that scrape it, going over hundreds of pages: #362172 (closed)
I would block this issue until we get the table size under control.
I'm not sure where we actually discuss the freeze on increasing index size now. Do you know? Because I believe it's mostly the partitioning epic that un-blocks index adjustments but I wonder if the de-composition will un-lock anything.
I think we would need to add more than one index because the finder starts filtering from different relations
@allison.browne there's &6203 which doesn't seem to be active and I think there's some Rubocop rule that would activate if you add something to the table(I can't find right now).
+1 on that. Filtering on name and ref would be very useful. My scenario is (#353168 (closed)) that I want to know "which deploy jobs are currently running, or have finished recently".
@tnir IMHO it's not good to use Deployments API for searching jobs, these are 2 different areas. Deployments are based on jobs, but jobs not necessarily execute deployments. You can have only build or QA jobs in your CI/CD and still want to search specific jobs (by name).
@Wirone I did not talk about searching jobs by name in this thread. @jakub-g wants to search jobs by ref in this issue, which focuses on searching by name. So, I just suggested to use Deployments API instead (of upcoming Jobs API wth ref).
Bump please! My team is also a premium subscriber and would love this feature! Filtering by name would be extremely useful in finding slow jobs/runners.
Thanks for your work on this, @pburdette! I really hope to have this capability this year. For now, I'm having to make an API script to scrape and log the jobs to a separate database. For each test job, we need to be able to quickly search:
When was the last successful run
What were the changes in the first failure
The junit status is not sufficient since it fails to account for different environments. As of right now, we would have to do a lot of clicking through individual pipelines to get an idea of what is happening with a single job across changes and different environments.
And yes, of course we have tests that may fail for an extended period of time. When prototyping, the test must be created first before the feature.
How has this issue been open for 5+ years!? Gitlab markets itself as the superior platform for CI/CD in a major way and we can't search for Jobs based on anything more than Status? This is insane.
@rutshah thanks for the ping! Yes the issue I'm working on in %17.0 should solve filtering on name for backend . Then I'll ping the frontend folks once they're unblocked. Compound filters involving names and anything else will be done later.
We can use this issue as a tracker for the API changes once the foundations are set in #446304 (closed)