Spike: Benchmark regex performance between Ruby and Go
Context
As part of discovery work for #422574 (closed), we should benchmark regex matching performance between Ruby and Go. This will inform whether Ruby is performant enough to match secrets in commit blobs on the critical request path.
Note that prior work was done to benchmark Go's standard library regex functions: https://gitlab.com/gitlab-org/secure/pocs/secret-detection-go-poc#benchmarking
The outcome of this spike is to determine whether secret matching should be implemented:
Proposal
- Collect metrics on commit sizes. We have a pre-receive check that ensures commits are below a certain size, but unsure if these data are persisted anywhere. Collecting these metrics will help inform a reasonable data set for benchmarking.
- Perform regex benchmarking between a simple Go and Ruby implementation. An easy optimisation to include is a substring match prior to invoking the regex functions, particularly as the secrets we're looking for all have a common prefix (e.g.
glpat
). You can refer to this example which skips the regex processing if the prefix of the secret wasn't first found by a substring search. - Summarise results and decide on an approach.
Additional Considerations
- After we complete these two spikes, we'd like to be able to give a high confidence estimate on when we'd be able to deliver Perform secret detection for highest risk conte... (#422574 - closed)
- We'll look to set up a stage-wide brainstorming session in %16.5 to review/discuss the decisions we made as well as any lessons learned.
Edited by James Liu