Usage Ping for DAST Full scans
Problem to solve
Active Scan for DAST lacks the Usage Ping instrumentation due to the performance implications. Still, we need to collect information on the usage of this feature.
Further details
It's a follow-up after https://gitlab.com/gitlab-org/gitlab-ee/issues/7182.
Most likely, it would be feasible to implement this after efficient counters are implemented for GitLab web app.
Proposal
Implement the usage tracking for this feature: collect the number of pipelines executed with dast
job enabled and DAST_FULL_SCAN_ENABLED
environment variable
What does success look like, and how can we measure that?
Usage data for DAST Full scan pipelines is collected without performance problems or downtimes at GitLab.com and 10K self-managed instances.
Links / references
Current status
Current database layout and size of the ci_builds
table makes it impossible to have a working implementation of Usage Ping on instances like GitLab.com and larger.
Reasons:
- As for now, there is no effective means to mark a build in the database as originating from a secure job and/or having a particular ENV var set (e.g. a flag column or a metadata attribute in
jsonb
column). By "effective" I mean indexed and allowing to execute a query to find such builds within default query timeout. - There is a partial index against
name
column inci_builds
table that is currently used to find security job builds there. While being pretty fragile by itself (think different job name thansast
,dast
etc.), it still doesn't help to fit into the query timeout.- Index definition
- Query examples:
-
EXPLAIN SELECT COUNT(DISTINCT ci_builds.id) FROM ci_builds WHERE ci_builds.name = ‘dast’
(chatops command, internal, execution result, internal) -
EXPLAIN SELECT COUNT(DISTINCT ci_builds.id) from ci_pipelines INNER JOIN ci_builds ON ci_builds.commit_id = ci_pipelines.id WHERE ci_builds.name = ‘dast’
(chatops command, internal, execution result, internal)
-
- These SQL queries are constructed as raw
COUNT
queries because that's how theUsageData
is constructed and sent to version.gitlab.com; no time-framed queries (COUNT
for last month etc.) or any other partitioning supported