Backend: No pipelines created when matrix var is longer than 114 chars

changed title from No pipeline created odd edge case to No pipelines created when matrix var is longer than 114 chars

mentioned in issue gitlab-org/quality/triage-reports#7987 (closed)

mentioned in issue gitlab-org/quality/triage-reports#8216 (closed)

added Category:Continuous Integration backend customer devopsverify grouppipeline execution typebug labels

@marknuzzo

I've triaged this issue. Is this something your team can help with?

thanks for reviewing this @splattael we may address No Notifications for Failed Scheduled Pipelines (#36806) first to provide better visibility into when pipelines fail to create and will review causes for those as well.

Thanks @splattael for the ping here. I agree that with having insight into notifications to create that visibility, it will in turn provides a better way to see when these scenarios occur.

While I agree pipeline error notifications could be a nice to have feature, I would rather not wait for #36806 to be resolved before treating this ticket.

This ticket is about a bug that is affecting us, premium customers, in our work. I am pretty sure this bug, which looks like a buffer overflow, can be resolved by looking at GitLab logs. There is not need for a visual notifications of the bug here.

Should I reach the premium support helpdesk to escalate the issue ?

@jheimbuck_gl @marknuzzo

thanks for the ping @vdsbenoit. Very odd that this particular set of circumstances (variable length + needs) creates the bug!!

A first fix for this might be throwing an error in the pipeline editor when the total variable length exceeds 114 but I don't think that's the outcome you want.

Heads up for @marknuzzo - we might investigate if there's a hard limit we should document here. In the meantime we can document the limit when mixing usage of parallel:matrix and needs: here maybe? cc @marcel.amirault

Thank you @jheimbuck_gl

Let me emphasis that the 114 char limit is not per variable but for all the variables of a matrix item together. The issue becomes very cumbersome when we have multiple variables, with few characters each, but with a sum that exceeds 114. For instance:

job:
  parallel:
    matrix:
      - FOO1: abcdefghij
        FOO2: abcdefghij
        FOO3: abcdefghij
        FOO4: abcdefghij
        FOO5: abcdefghij
        FOO6: abcdefghij
        FOO7: abcdefghij
        FOO8: abcdefghij
        FOO9: abcdefghij
        FOO10: abcdefghij
        FOO11: abcdefghij
        FOO12: abcdefghij

Heads up for @marknuzzo - we might investigate if there's a hard limit we should document here

Hi @jheimbuck_gl - I'm leaning towards tracking the investigation in a separate upstream issue and linking these together. WDYT?

@vdsbenoit - thanks for clarifying, i saw the same thing.

@marknuzzo that works for me, can you create the investigation issue? Note this behavior is only showing up when the combined length of matrix variables + needs are both present

HI @jheimbuck_gl - For now, I marked this issue as workflowblocked only to enforce that #369894 (closed) research should take place first to help inform of direction here. Please let me know if I interpreted that flow correctly.

Huh, this is very strange. I'd like to understand this a bit more to know how to write up the docs for it. Let me ping @mbobin, who I think knows this code well.

@marcel.amirault it's related to how we compute the names for matrix jobs and needs limits. The name for Parent job job becomes Parent job: [fooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo] and because it is a need for Child job, the need name is expanded to use the variable, like needs: ["Parent job: [foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo]"], but we do have a name length validation for needs, which is 128 characters:

    validates :name, presence: true, length: { maximum: 128 }

And I can see this error during debugging:

[14] pry(#<Gitlab::Ci::Pipeline::Chain::Populate>)> pipeline.stages.last.statuses.last.errors.messages
=> {:"needs.name"=>["is too long (maximum is 128 characters)"]}

But it doesn't bubble up as pipeline error:

And it's interesting that it doesn't match the 255 chars limit on job names:

gitlabhq_dblab=# \d ci_builds
                                               Table "public.ci_builds"
         Column          |            Type             | Collation | Nullable |                Default
-------------------------+-----------------------------+-----------+----------+---------------------------------------
 name                    | character varying(255)      |           |          |

Was it a bad idea to use the variable values in the job's name?

/cc @lauraX

Was it a bad idea to use the variable values in the job's name?

@mbobin - perhaps! So we can either truncate this job name OR use something else as a job name ...? Truncating seems like the best solution.

@mbobin @lauraX Sorry, I'm not following how this limitation works... I tried taking the details in the reply above and documenting it, but each time I wrote it out it was clear I have no idea what I'm talking about.

If you had to add it to the docs () how would you explain it?

Hi @marcel.amirault - I think I would say something along the lines of:

Aggregated matrix variables values cannot be greater than 114. - Do you think we should mention why this is the case?

@marknuzzo I think we could potentially unblock this issue by breaking up a solution in three issues:

Make the error message actually say what the error is
Figure out why the job name doesn't match the chars limit and fix that (perhaps the validation too)
As a longer term solution - use something else as a job name that is not the variable value <- if we do this, we might not need to do any of the above, but everything depends on timing

@lauraX

So we can either truncate this job name OR use something else as a job name ...? Truncating seems like the best solution.

Truncating is not a good option because we can end up with jobs that have the same name in case of multiple variables. We could use the x/x format from the parallel: int use case, but in this case the user will have no idea what variables the job is using without doing some echo statements in the script. Should we add a description field to jobs to hold the information about parallelization?

Figure out why the job name doesn't match the chars limit and fix that (perhaps the validation too)

Yup, we should definitely look into this.

@marcel.amirault

If you had to add it to the docs how would you explain it?

Yup, it's a tough one, maybe something like this?

parallel:matrix jobs use the variables values in the job name to make them easily identifiable, but this has some limitations:

job names are limited to 255 chars, so the generated name must be under this limit
when used with needs this limit is lowered to 128 chars, but this is should match the the job names limit - link to issue

@mbobin

Truncating is not a good option because we can end up with jobs that have the same name in case of multiple vari ah, yes, this makes sense.

Should we add a description field to jobs to hold the information about parallelization?

We could do this, and additionally fix the limit to make them match. Maybe this will be enough for now.

@marknuzzo - I think this is ready to be worked on, with the proposal being to fix the limits to make them match. Marius suggested also adding a description field to jobs, which can be done either as part of this issue or another one.

@marknuzzo - I think this is ready to be worked on, with the proposal being to fix the limits to make them match. Marius suggested also adding a description field to jobs, which can be done either as part of this issue or another one.

Thanks @lauraX - I'm going to remove the #369894 (closed) blocker then but I don't see a backend weight set yet so I will move to workflowplanning breakdown until we know what that is.

@marknuzzo this is a backend-weight2 updated the label

I think we can leave the description addition up to whoever picks up this issue - it can be done in the same MR or as a follow-up, depending on how the MR goes.

Thanks @lauraX - @dhershkovitch - with this weighted now and being ready, I think we need to compare this issue against our next prioritization typebug board to determine the appropriate timing in an upcoming iteration/milestone.

/cc @treagitlab for awareness due to the typebug discussion and priority.

Sorry for the delay, but I'm adding the details Marius pointed out above to the docs here: !102489 (merged)

Whilst the specific bug/issue isn't the same. The manifestation is the same so I'm going to reference #36806

I've just gone through #36806. I think both issues are unrelated:

The effect is different: pipeline not triggered due to user access vs pipeline not triggered likely due to a variable overflow
The root cause is different: schedule pipeline ownership vs long job variable value

The root cause is different but the behaviour/result/output/effect is the same.

All right, I got you point. In a way, we could say they relate in the fact GitLab does not display any errors when a pipeline does not run.

mentioned in issue #36806

changed the description

marked this issue as related to #36806

added priority2 severity2 labels

changed milestone to %Backlog

added sectionops label

added [deprecated] Accepting merge requests label

marked this issue as related to #369894 (closed)

removed the relation with #369894 (closed)

added workflowblocked label

marked this issue as related to #369894 (closed)

mentioned in issue #369894 (closed)

added workflowplanning breakdown label and removed workflowblocked label

added needs weight label

Iteration	Description	Limitations
Step 1: Surface the error	Ensure that an error message is surfaced when character limits are exceeded.	Does not solve the problem to allow for long matrix section. Users still need to use identified workaround of variable indirection.
Step 2: Make the character limits match		Does not solve the problem to allow for long matrix section. Users still need to use identified workaround of variable indirection.
Step 3: Update UI to show variables	Add UI elements to expose the variables used, rather than relying on the name. This may require updating the API to contain the information if not already available	Does not solve the problem to allow for long matrix section. Users still need to use identified workaround of variable indirection.
Step 4: Allow Custom Matrix Names	Exact details to be determined. Allow the user to assign names to the various configurations through the config. This allows for names that are meaningful. When names aren't provided default to names by variable - same behaviour as today.	Requires the UI update from step 3 to maintain parity with existing experience of viewing variables, but does fix the issue of being limited in matrix variables.

Group	Issue Link
backend	You are here
backend	Backend: Match Ci::BuildNeed name limit with jo... (#420669 - closed)
backend / frontend	Frontend/Backend: Update UI to show variables (#428165)
backend	Backend: Allow Custom Matrix Names (#428169 - closed)

Backend: No pipelines created when matrix var is longer than 114 chars

Summary

Steps to reproduce

Example Project

What is the current bug behavior?

What is the expected correct behavior?

Output of checks

Possible fixes

Note

Proposal

Questions to Answer

Implementation Table

Designs

Child items ...

Activity

Backend: No pipelines created when matrix var is longer than 114 chars

Summary

Steps to reproduce

Example Project

What is the current bug behavior?

What is the expected correct behavior?

Output of checks

Possible fixes

Note

Proposal

Questions to Answer

Implementation Table

Blocks

Relates to

Activity