Skip to content

Identify users that are subject to PIPL

Eugie Limpin requested to merge el-log-user-access-from-china into master

Context

See https://gitlab.com/groups/gitlab-org/-/epics/12600.

What does this MR do and why?

For each user accessing GitLab.com, we track and count the access if it is coming from a PIPL-covered country (done by checking HTTP_CF_IPCOUNTRY request env var). If the user then accesses from a non PIPL-covered country, their existing access log record is updated to reset its access_count value. This tracking is limited to once per qualified user every 24 hours.

Further, if the user is determined to be exclusively accessing from PIPL-covered countries (met_pipl_access_threshold?), a background job is queued to check if the user belongs to any paid namespace. This check is necessary because users belonging to paid namespaces are exempt from actions taken to ensure PIPL compliance.

What is not in this MR?

This MR only includes changes necessary to determine if a user is subject to PIPL. Separate MRs will be opened using the functionality added in this MR to notify, block, and eventually delete target users (see https://gitlab.com/gitlab-org/gitlab/-/issues/455706#note_1988620480).

Database changes

  1. Add last_access_from_pipl_country_at column to users - see !157579 (comment 1984998142) for reason why this addition is necessary

Scopes

Users::CountryAccessLogs.from_country_code

Raw SQL
SELECT
    "country_access_logs".*
FROM
    "country_access_logs"
WHERE
    "country_access_logs"."user_id" = 1
    AND "country_access_logs"."country_code" IN (0, 1, 2);
Explain

https://console.postgres.ai/shared/a84bc891-5a5e-4ffd-b15c-d92729e49150

 Index Scan using index_country_access_logs_on_user_id_and_country_code on public.country_access_logs  (cost=0.15..4.99 rows=1 width=62) (actual time=0.045..0.048 rows=3 loops=1)
   Index Cond: ((country_access_logs.user_id = 1) AND (country_access_logs.country_code = ANY ('{0,1,2}'::integer[])))
   Buffers: shared hit=13
   I/O Timings: read=0.000 write=0.000

Users::CountryAccessLogs.with_access

Raw SQL
SELECT
    "country_access_logs".*
FROM
    "country_access_logs"
WHERE
    "country_access_logs"."user_id" = 1
    AND (access_count > 0)
Explain

https://console.postgres.ai/shared/4b1c4f90-6d58-413c-9be8-1e4dc348c4ee

 Index Scan using index_country_access_logs_on_user_id_and_country_code on public.country_access_logs  (cost=0.15..7.73 rows=2 width=62) (actual time=0.070..0.071 rows=1 loops=1)  
   Index Cond: (country_access_logs.user_id = 1)  
   Filter: (country_access_logs.access_count > 0)  
   Rows Removed by Filter: 2  
   Buffers: shared hit=5  
   I/O Timings: read=0.000 write=0.000

Users::CountryAccessLogs.first_access_before

Raw SQL
SELECT
    "country_access_logs".*
FROM
    "country_access_logs"
WHERE
    "country_access_logs"."user_id" = 1
    AND (first_access_at <= '2024-01-09 09:41:16.188546')
Explain

https://console.postgres.ai/shared/b5eaf040-1552-44dd-a62d-28d0f6c414eb

 Index Scan using index_country_access_logs_on_user_id_and_country_code on public.country_access_logs  (cost=0.15..7.73 rows=2 width=62) (actual time=0.053..0.054 rows=1 loops=1)
   Index Cond: (country_access_logs.user_id = 1)
   Filter: (country_access_logs.first_access_at <= '2024-01-09 09:41:16.188546+00'::timestamp with time zone)
   Rows Removed by Filter: 2
   Buffers: shared hit=5
   I/O Timings: read=0.000 write=0.000

user.country_access_logs.from_country_code(COVERED_COUNTRY_CODES).with_access

Raw SQL
EXPLAIN
SELECT
    "country_access_logs".*
FROM
    "country_access_logs"
WHERE
    "country_access_logs"."user_id" = 1
    AND "country_access_logs"."country_code" IN (0, 1, 2)
    AND (access_count > 0);
Explain

https://console.postgres.ai/shared/88473649-b4af-41bb-8a48-6bd04ee449e8

 Index Scan using index_country_access_logs_on_user_id_and_country_code on public.country_access_logs  (cost=0.15..4.99 rows=1 width=62) (actual time=0.068..0.069 rows=1 loops=1)
   Index Cond: ((country_access_logs.user_id = 1) AND (country_access_logs.country_code = ANY ('{0,1,2}'::integer[])))
   Filter: (country_access_logs.access_count > 0)
   Rows Removed by Filter: 2
   Buffers: shared hit=13
   I/O Timings: read=0.000 write=0.000

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Note: the validation steps below only verifies one part of the functionality introduced in this MR. Integration tests (see ee/spec/requests/root_controller_spec.rb) were added to ensure all modules introduced in this MR are working correctly.

  1. Start GDK simulating SaaS

    $ export GITLAB_SIMULATE_SAAS=1
    $ gdk start
  2. Enable required FFs

    $ rails c
    > Feature.enable(:track_user_access_from_pipl_countries)
  3. Hard-code request.env['HTTP_CF_IPCOUNTRY'] value in ee/app/controllers/concerns/users/track_country_access.rb

    def track_country_access
       # Compliance::Pipl::TrackUserCountryAccessService.new(current_user, request.env['HTTP_CF_IPCOUNTRY']).execute
       Compliance::Pipl::TrackUserCountryAccessService.new(current_user, 'CN').execute
    end
  4. Login with root and load the root path (localhost:3000/)

  5. At this point an access should have been tracked for the root user. In Rails console, verify this by checking root user's country_access_logs

    $ rails c
    > User.first.country_access_logs
    => [#<Users::CountryAccessLog:0x000000016fd13b78
      id: 1,
      created_at: Tue, 02 Jul 2024 10:32:55.219161000 UTC +00:00,
      updated_at: Tue, 02 Jul 2024 10:55:24.476584000 UTC +00:00,
      access_count_reset_at: nil,
      first_access_at: Tue, 02 Jul 2024 10:32:55.231767000 UTC +00:00,
      last_access_at: Tue, 02 Jul 2024 10:55:24.453392000 UTC +00:00,
      user_id: 1,
      access_count: 1,
      country_code: "CN">]
Edited by Eugie Limpin

Merge request reports