Skip to content

Add user secure scan pings

Cameron Swords requested to merge add-user-secure-scan-pings into master

What does this MR do?

Adds a Usage Ping for Secure that tracks unique users for scan types. Specifically, the following usage metrics are added:

"secure": {
  "user_sast_scans": 0,
  "user_dependency_scanning_scans": 0,
  "user_container_scanning_scans": 0,
  "user_dast_scans": 0,
  "user_secret_detection_scans": 0,
  "user_coverage_fuzzing_scans": 0,
  "user_api_fuzzing_scans": 0
}

This metric is very similar to the usage_activity_by_stage.secure.user_[analyzer name]_jobs usage ping, however this new metric tracks based on a join with the security_scans table instead of the less stable search based on the job name. The old usage_activity_by_stage.secure.user_[analyzer name]_jobs metrics are required until the legacy way of running analyzers is removed.

This MR resolves issue #239118 (closed).

Database Queries

Distinct users for a Secure analyzer (last 28 to last 2 days)

  1. Find the minimum/maximum security_scans ID. UPDATE Uses implementation from !48671 (merged)
  2. In batches, find distinct users https://explain.depesz.com/s/YVnc
    WITH hashed_attributes AS (
           SELECT
               CAST('X' || md5(CAST(user_id AS text)) AS bit(32)) AS attr_hash_32_bits
           FROM
               "security_scans"
               INNER JOIN "ci_builds" ON "ci_builds"."id" = "security_scans"."build_id"
                   AND "ci_builds"."type" = 'Ci::Build'
           WHERE
               "ci_builds"."status" = 'success'
               AND ("ci_builds"."retried" = FALSE
                   OR "ci_builds"."retried" IS NULL)
               AND (security_scans.scan_type = 1)
               AND "security_scans"."created_at" BETWEEN '2020-11-15 06:07:01.288027' AND '2020-12-13 06:07:01.288104'
               AND "security_scans"."id" >= 4378000
               AND "security_scans"."id" < 4379000
               AND user_id IS NOT NULL
       )
       SELECT
           (attr_hash_32_bits & B'00000000000000000000000111111111')::int AS bucket_num,
           (31 - floor(log(2, min((attr_hash_32_bits & B'01111111111111111111111111111111')::int))))::int AS bucket_hash
       FROM
           hashed_attributes
       GROUP BY
           1

Batch size

Batch size is the same based on whether there is a timeframe constraint or not. A size of 1k has been chosen as it returns acceptable results for all situations.

Batch size No timeframe constraint last 28 days - last 2 days timeframe
10k ~82ms https://explain.depesz.com/s/hMxL ~8.8s https://explain.depesz.com/s/1Wp7
5k ~44ms https://explain.depesz.com/s/OLJt ~70ms https://explain.depesz.com/s/tbON
1k ~15ms https://explain.depesz.com/s/25l9 ~20ms https://explain.depesz.com/s/TxOF7

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Cameron Swords

Merge request reports