Add number of unique unit tests parsed by JUnit feature to usage ping

Problem to solve

As a GitLab Engineer I want to know how many unique unit tests are parsed on a weekly basis on a GitLab instance as part of the usage ping payload so that I can effectively plan how I need to structure data related to the unit test features. Establishing an average number of unique unit tests per instance will be enormously helpful in planning the technical implementation of future testing features, including test history.

As a GitLab Product Manager I want to know how many unique unit tests are parsed on a weekly basis on a GitLab instance as part of the usage ping payload so that I can measure usage of the JUnit testing report feature and monitor it over time.

Intended users

@rickywiens @jheimbuck_gl

User experience goal

No visible change to UX

Proposal

When we parse a JUnit report in the background we should use the HLL counter in Usage ping to count the key created for the test. HLL counters will increment only if the key passed is unique. We should think about how to make the key more meaningful. One idea is that we could concatenate the job name to the existing key. We should hash whatever key we decide on before passing it to the HLL counter. We should pass them to the HLL counter in batches of 1000.

Concerns

We likely want to pass as many unit tests as possible in batches to the HLL function. I'm not sure how the HLL wrapper class works, but we likely only want to pass a few hundred/thousand at a time into Redis directly. Keep in mind when we are parsing the JUnit report there can be hundreds of thousands of keys that we will want to insert into the HLL.

We benchmarked the different batch sizes (internal) and decided to go with batches of 1k, with hashed values.

Further details

This data will help inform architecture for test history so the feature is performant and storage is reasonable.

Permissions and Security

N/A

Documentation

N/A - temporary internal counter, doesn't need docs.

Availability & Testing

What does success look like, and how can we measure that?

measures of success

There's a counter for tests parsed that can be accessed on a sisense dashboard
Each test is only counted once (there's some thinking to be done here about the existing issue of duplicate test names)

Links / references

Edited Aug 26, 2020 by Ricky Wiens