Rework logic on has_element? to avoid excessive waits (!118405) · Merge requests · GitLab.org / GitLab

What does this MR do and why?

The current logic doesn't allow for quickly checking if an element is ready on the page or not - to do this one may set wait=0 implying they don't want to wait, but the current logic forces a 1sec wait regardless.
In an isolated, this isn't a huge problem, but when this is repeated throughout our suite, this can add up.

This was noted in Add measure function to identity slow qa selectors (!118385 - merged) where the login process was identified as having an extra 4s overhead due to the existing logic, adding extra 1s on 4 individual checks. When this is applied to every test in the suite - this is a noticeable amount of overhead that should be better handled.

Instead - let's allow for returning early if an element is found prior to the page being fully loaded but otherwise, wait for the network requests to finish, and then use the wait time the caller proposed, without the addition of a forced 1sec wait.

Note - if we identify cases where this change perhaps introduces flakiness - I would expect it to be due to calling with wait=0 and I would suggest that perhaps those lookups should be updated to use a longer wait, rather than having the 1sec added here which impacts every single call throughout the suite that returns false from this method.

How to set up and validate locally

full suite should continue to run
there should be a reduction in the overall test duration due to the removal of the extra 1s waits
compare some individual jobs to observe reduction in duration
use allure report total duration as a comparison tool to show total suite duration

Comparing similar jobs (focusing on instance-parallel jobs)

Before https://gitlab.com/gitlab-org/gitlab/-/pipelines/846286839 (5 jobs took 171min)
After https://gitlab.com/gitlab-org/gitlab/-/pipelines/845134669 (5 jobs tool 138 min)
This shows a reduction of 33 mins total == ~6.5 mins per job

Allure Reports Compare

We can see a reduction in the suite duration of approximately 5mins using the allure report as a measure

Reasoning

Add measure function to identity slow qa selectors (!118385 - merged) added logging highlighting places where the tests may pause for excessive lengths of time.
Looking up the Warning WARN -- Executed method has_element? we can count a total of 1817 cases where the has_element? method is slow. Taking an assumption that each case of this can be reduced by 1sec by this MR that equates to 1817sec/60sec == ~30mins which split over 5 jobs equates to approximately 6 minutes per job if split parallel in 5. which is roughly in line with what we've seen in the results from above.
Admittedly, this logic is a little naive though as there will of course be some cases where an element is just generally slow to be found, but I think they'll probably be the exception to the rule.

Data Summary

NB - There may be some variability due to environment but on the whole the results in this pipeline are in line with what I'd expected to see so I think it should be repeatable in general.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Edited Apr 25, 2023 by John McDonnell

Rework logic on has_element? to avoid excessive waits