Skip to content

Improve test coverage around file paths

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Extracted from gitlab-com/gl-infra/production#2318 (comment 368192082)

@DylanGriffith I think it would be worth thinking about some test for these patterns. I think we can look for a file path dataset or use fuzzing to create our own, but it seems like a significant point of failure.

Regexes are hard to reason about. ­— @mbergeron

@mbergeron It would be great if we have the Keyword reporting to get real scenarios to base these test patterns from. — @JohnMcGuire

@JohnMcGuire for sure, having real-world scenarios to use as a test against any changes we make would be cool, although the privacy implications of scraping search data is worth considering. Is there a way to only collect data from GitLab employees? Also, if I was reading Dylan's comments correctly it is hard to correlate a spike in search cpu use to a specific problematic search, is that right @DylanGriffith?

As far as testing changes to regex I've not had any experience here. Though my first thought it to have a set of known problematic search strings (like the ALL CAPS storage problem #224472 (closed)). And, we probably want to have a test suite that can detect the various kinds of problems that can arise, be that: search time, storage efficiency, memory usage etc. But that's beyond the scope of this ticket, which is only concerned about CPU usage. I'll make an issue to try to collect ideas about how we can and should do testing against these various aspects in the future and stop rambling here. — @ebanks

@ebanks Thanks, I will look into data limited to gitlab employees. Engineering uses some reports that have some tracking setup by IDs and list that is of active employee IDs. It's an interesting consideration that the Keywords used may have privacy implications. — @JohnMcGuire

The CPU spike was not related to search request. It was on the indexing side. My thought is that we would want to have use cases for returning results on searches for Paths and we will want to check on regression if possible.

Edited by 🤖 GitLab Bot 🤖