Data to be included in pseudonymization service
Summary
In &6309 (closed) we are creating a pseudonymization service to protect personally identifiable information related to Users.
In this issue, we will define which collected data can be used to identify users for purposes of including in the pseudonymization service.
What data is in scope of pseudonymization?
Personally identifiable User data which is related to private profiles will be pseudonymized.
We will pseudonymize some group, project and namespace data which could lead to identification of users. Examples of this are group names. As an example, I may call my group "Amanda Rueda's Cool Group" which would then allow viewers of group activity data to know it was my personal activity.
However, we are not pseudonymizating data with the intention of preventing tieing the data to an entity. With pseudonymization of user data in place, we will be able to understand activity behavior at a Company level.
An example of this would be:
- We can know that Acme Company has 20 groups and 35 projects
- We can know that 4 users within Project X (de-identified) created Epics this week
- We can know that a single user (identity unknown) in Project Y (de-identified) created an MR then purchased additional CI minutes.
Considered data for pseudonymization
Noting that information documented in the below table is valid as of 2021-09-15
Metric | Example Data | Should be De-identified? | Should be Collected? | Currently Collected? | Comment | |
---|---|---|---|---|---|---|
1 | user_ID |
"2890431" | Yes | Yes | No | This is an indirect indentifier which can be used to reveal directly identifiable data. With user_id anyone can access name and username of both public and private profiles. |
2 | username |
"amandarueda" | Yes | No | No | This can be personally identifiable data |
3 | user_name |
"Amanda Rueda" | Yes | No | No | This is personally identifiable data |
4 | user_email |
"arueda@gitlab.com" | No | No | No | While this can be personally identifiable data, we should not collect it at all. |
5 | user_public_email |
"arueda@gitlab.com" | No | No | No | While this can be personally identifiable data, we should not collect it at all. |
6 | Social media handles: skype linkedin twitter
|
"amandamrueda" "amandamrueda" "amandamrueda" |
No | No | No | This is PI data, we should not collect it at all. |
7 | website_url |
"https://gitlab.com/amandarueda" | No | No | No | While this can be personally identifiable data, we should not collect it at all. |
8 | organization |
"GitLab" | No | No | No | While this can be personally identifiable data, we should not collect it at all. |
9 | group_ID |
"12813618" | No | Yes | No | While the Group ID can be used to identify the group name via the api, this is only true for groups set to Public visibility or where you are a member. Given this, anonymization of Group ID is not necessary. |
10 | group_name |
"Golden Path" | Yes | No | No | This can be personally identifiable data |
11 | group_description |
"This is the group Golden Path, we're great!" | No | No | No | While this can be personally identifiable data, we should not collect it at all. |
12 | group_path |
"golden_path" | Yes | Yes | No | This can be personally identifiable data |
13 | group_web_url |
"https://gitlab.com/groups/golden_path" | Yes | Yes | Yes | This can be personally identifiable data |
14 | group_full_name |
"Golden Path" | Yes | No | No | This can be personally identifiable data |
15 | group_full_path |
"golden-path" | Yes | Yes | No | This can be personally identifiable data |
16 | project_ID |
"27005757" | No | Yes | No | While the Project ID can be used to identify the project name via the api, this is only true for projects set to Public visibility or where you are a member. Given this, anonymization of Project ID is not necessary. |
17 | project_name |
"Amanda Rueda Project" | Yes | No | No | This can be personally identifiable data |
18 | project_description |
"This is Amanda Rueda's project covering all things that are cool." | No | No | No | While this can be PII data, we should not collect it at all. |
19 | project_name_with_namespace |
"Golden Path / Amanda Rueda Project" | No | No | No | While this can be personally identifiable data, we should not collect it at all. |
20 | project_path |
"amanda-rueda-project" | Yes | Yes | No | This can be personally identifiable data |
21 | project_path_with_namespace |
"golden-path/amanda-rueda-project" | Yes | Yes | No | This can be personally identifiable data |
22 | project_ssh_url_to_repo |
"git@gitlab.com:golden-path/amanda-rueda-project.git" | No | No | No | While this can be personally identifiable data, we should not collect it at all. |
23 | project_http_url_to_repo |
"https://gitlab.comgolden-path/amanda-rueda-project.git" | No | No | No | While this can be personally identifiable data, we should not collect it at all. |
24 | project_web_url |
"https://gitlab.comgolden-path/amanda-rueda-project" | No | No | No | While this can be personally identifiable data, we should not collect it at all. |
25 | project_readme_url |
"https://gitlab.com/golden-path/amanda-rueda-project/-/blob/master/README.md" | No | No | No | While this can be PII data, we should not collect it at all. |
26 | namespace_ID |
"12174719" | No | Yes | No | While the Namespace ID can be used to identify the namespace name via the api, one can only return namespace information for which they are a member of. |
27 | namespace_path |
"amandarueda" | Yes | Yes | No | This can be personally identifiable data |
28 | namespace_name |
"Amanda Rueda" | Yes | No | No | This can be personally identifiable data |
29 | namespace_full_path |
"amandarueda" | Yes | Yes | No | This can be personally identifiable data |
30 | namespace_web_url |
"https://gitlab.com/amandarueda" | Yes | Yes | No | This can be personally identifiable data |
31 | uuid |
3059d4a0-7f26-4edc-h989-545aff87da5x |
No | Yes | Yes | This is not personally identifiable data |
32 | ip address |
192.158.1.38 |
No | Yes | Yes | This is not personally identifiable data without other joined data |
Example API Outputs
Below outputs run by a non-admin user not related to the queried object
Example API Output - User
{
"id": 8956705,
"name": "Amanda Rueda",
"username": "arueda24",
"state": "active",
"avatar_url": "https://secure.gravatar.com/avatar/04cb48ea5a0b81467abc897dca331f61?s=80&d=identicon",
"web_url": "https://gitlab.com/arueda24",
"created_at": "2021-05-24T17:43:01.933Z",
"bio": "",
"bio_html": "",
"location": null,
"public_email": "",
"skype": "arueda24",
"linkedin": "arueda24",
"twitter": "arueda24",
"website_url": "www.arueda24.com",
"organization": null,
"job_title": "",
"pronouns": null,
"bot": false,
"work_information": null,
"followers": 0,
"following": 0
}
Example API Output - Group
{
"id": 12813618,
"web_url": "https://gitlab.com/groups/teste309",
"name": "teste",
"path": "teste309",
"description": "",
"visibility": "public",
"share_with_group_lock": false,
"require_two_factor_authentication": false,
"two_factor_grace_period": 48,
"project_creation_level": "developer",
"auto_devops_enabled": null,
"subgroup_creation_level": "maintainer",
"emails_disabled": null,
"mentions_disabled": null,
"lfs_enabled": true,
"default_branch_protection": 2,
"avatar_url": null,
"request_access_enabled": true,
"full_name": "teste",
"full_path": "teste309",
"created_at": "2021-07-23T13:16:10.678Z",
"parent_id": null,
"ldap_cn": null,
"ldap_access": null,
"shared_with_groups": [],
"prevent_sharing_groups_outside_hierarchy": false,
"projects": [
{
"id": 28300516,
"description": "",
"name": "sfi_ttt_ttt",
"name_with_namespace": "teste / sfi_ttt_ttt",
"path": "sfi_ttt_ttt",
"path_with_namespace": "teste309/sfi_ttt_ttt",
"created_at": "2021-07-21T17:26:57.957Z",
"default_branch": "main",
"tag_list": [],
"topics": [],
"ssh_url_to_repo": "git@gitlab.com:teste309/sfi_ttt_ttt.git",
"http_url_to_repo": "https://gitlab.com/teste309/sfi_ttt_ttt.git",
"web_url": "https://gitlab.com/teste309/sfi_ttt_ttt",
"readme_url": "https://gitlab.com/teste309/sfi_ttt_ttt/-/blob/main/README.md",
"avatar_url": null,
"forks_count": 0,
"star_count": 0,
"last_activity_at": "2021-07-23T14:44:11.063Z",
"namespace": {
"id": 12813618,
"name": "teste",
"path": "teste309",
"kind": "group",
"full_path": "teste309",
"parent_id": null,
"avatar_url": null,
"web_url": "https://gitlab.com/groups/teste309"
},
"container_registry_image_prefix": "registry.gitlab.com/teste309/sfi_ttt_ttt",
"_links": {
"self": "https://gitlab.com/api/v4/projects/28300516",
"issues": "https://gitlab.com/api/v4/projects/28300516/issues",
"merge_requests": "https://gitlab.com/api/v4/projects/28300516/merge_requests",
"repo_branches": "https://gitlab.com/api/v4/projects/28300516/repository/branches",
"labels": "https://gitlab.com/api/v4/projects/28300516/labels",
"events": "https://gitlab.com/api/v4/projects/28300516/events",
"members": "https://gitlab.com/api/v4/projects/28300516/members"
},
"packages_enabled": true,
"empty_repo": false,
"archived": false,
"visibility": "public",
"resolve_outdated_diff_discussions": false,
"container_expiration_policy": {
"cadence": "1d",
"enabled": false,
"keep_n": 10,
"older_than": "90d",
"name_regex": ".*",
"name_regex_keep": null,
"next_run_at": "2021-07-22T17:26:57.999Z"
},
"issues_enabled": true,
"merge_requests_enabled": true,
"wiki_enabled": true,
"jobs_enabled": true,
"snippets_enabled": true,
"container_registry_enabled": true,
"service_desk_enabled": true,
"service_desk_address": "incoming+teste309-sfi-ttt-ttt-28300516-issue-@incoming.gitlab.com",
"can_create_merge_request_in": true,
"issues_access_level": "enabled",
"repository_access_level": "enabled",
"merge_requests_access_level": "enabled",
"forking_access_level": "enabled",
"wiki_access_level": "enabled",
"builds_access_level": "enabled",
"snippets_access_level": "enabled",
"pages_access_level": "enabled",
"operations_access_level": "enabled",
"analytics_access_level": "enabled",
"emails_disabled": false,
"shared_runners_enabled": true,
"lfs_enabled": true,
"creator_id": 8737232,
"import_status": "none",
"open_issues_count": 0,
"ci_default_git_depth": 50,
"ci_forward_deployment_enabled": true,
"ci_job_token_scope_enabled": false,
"public_jobs": true,
"build_timeout": 3600,
"auto_cancel_pending_pipelines": "enabled",
"build_coverage_regex": null,
"ci_config_path": "",
"shared_with_groups": [],
"only_allow_merge_if_pipeline_succeeds": false,
"allow_merge_on_skipped_pipeline": null,
"restrict_user_defined_variables": false,
"request_access_enabled": true,
"only_allow_merge_if_all_discussions_are_resolved": false,
"remove_source_branch_after_merge": true,
"printing_merge_request_link_enabled": true,
"merge_method": "merge",
"squash_option": "default_off",
"suggestion_commit_message": null,
"auto_devops_enabled": false,
"auto_devops_deploy_strategy": "continuous",
"autoclose_referenced_issues": true,
"keep_latest_artifact": true,
"approvals_before_merge": 0,
"mirror": false,
"external_authorization_classification_label": "",
"marked_for_deletion_at": null,
"marked_for_deletion_on": null,
"requirements_enabled": true,
"security_and_compliance_enabled": false,
"compliance_frameworks": [],
"issues_template": null,
"merge_requests_template": null,
"merge_pipelines_enabled": false,
"merge_trains_enabled": false
},
{
"id": 27860586,
"description": "",
"name": "Template_Pipeline",
"name_with_namespace": "teste / Template_Pipeline",
"path": "template_pipeline",
"path_with_namespace": "teste309/template_pipeline",
"created_at": "2021-07-02T12:04:01.929Z",
"default_branch": "main",
"tag_list": [],
"topics": [],
"ssh_url_to_repo": "git@gitlab.com:teste309/template_pipeline.git",
"http_url_to_repo": "https://gitlab.com/teste309/template_pipeline.git",
"web_url": "https://gitlab.com/teste309/template_pipeline",
"readme_url": "https://gitlab.com/teste309/template_pipeline/-/blob/main/README.md",
"avatar_url": null,
"forks_count": 0,
"star_count": 0,
"last_activity_at": "2021-07-23T13:48:26.023Z",
"namespace": {
"id": 12813618,
"name": "teste",
"path": "teste309",
"kind": "group",
"full_path": "teste309",
"parent_id": null,
"avatar_url": null,
"web_url": "https://gitlab.com/groups/teste309"
},
"container_registry_image_prefix": "registry.gitlab.com/teste309/template_pipeline",
"_links": {
"self": "https://gitlab.com/api/v4/projects/27860586",
"issues": "https://gitlab.com/api/v4/projects/27860586/issues",
"merge_requests": "https://gitlab.com/api/v4/projects/27860586/merge_requests",
"repo_branches": "https://gitlab.com/api/v4/projects/27860586/repository/branches",
"labels": "https://gitlab.com/api/v4/projects/27860586/labels",
"events": "https://gitlab.com/api/v4/projects/27860586/events",
"members": "https://gitlab.com/api/v4/projects/27860586/members"
},
"packages_enabled": true,
"empty_repo": false,
"archived": false,
"visibility": "public",
"resolve_outdated_diff_discussions": false,
"container_expiration_policy": {
"cadence": "1d",
"enabled": false,
"keep_n": 10,
"older_than": "90d",
"name_regex": ".*",
"name_regex_keep": null,
"next_run_at": "2021-07-03T12:04:01.951Z"
},
"issues_enabled": true,
"merge_requests_enabled": true,
"wiki_enabled": true,
"jobs_enabled": true,
"snippets_enabled": true,
"container_registry_enabled": true,
"service_desk_enabled": true,
"service_desk_address": "incoming+teste309-template-pipeline-27860586-issue-@incoming.gitlab.com",
"can_create_merge_request_in": true,
"issues_access_level": "enabled",
"repository_access_level": "enabled",
"merge_requests_access_level": "enabled",
"forking_access_level": "enabled",
"wiki_access_level": "enabled",
"builds_access_level": "enabled",
"snippets_access_level": "enabled",
"pages_access_level": "enabled",
"operations_access_level": "enabled",
"analytics_access_level": "enabled",
"emails_disabled": true,
"shared_runners_enabled": true,
"lfs_enabled": true,
"creator_id": 8737232,
"import_status": "none",
"open_issues_count": 0,
"ci_default_git_depth": 50,
"ci_forward_deployment_enabled": true,
"ci_job_token_scope_enabled": false,
"public_jobs": true,
"build_timeout": 3600,
"auto_cancel_pending_pipelines": "enabled",
"build_coverage_regex": null,
"ci_config_path": "",
"shared_with_groups": [],
"only_allow_merge_if_pipeline_succeeds": false,
"allow_merge_on_skipped_pipeline": null,
"restrict_user_defined_variables": false,
"request_access_enabled": true,
"only_allow_merge_if_all_discussions_are_resolved": false,
"remove_source_branch_after_merge": true,
"printing_merge_request_link_enabled": true,
"merge_method": "merge",
"squash_option": "default_off",
"suggestion_commit_message": null,
"auto_devops_enabled": false,
"auto_devops_deploy_strategy": "continuous",
"autoclose_referenced_issues": true,
"keep_latest_artifact": true,
"approvals_before_merge": 0,
"mirror": false,
"external_authorization_classification_label": "",
"marked_for_deletion_at": null,
"marked_for_deletion_on": null,
"requirements_enabled": true,
"security_and_compliance_enabled": false,
"compliance_frameworks": [],
"issues_template": null,
"merge_requests_template": null,
"merge_pipelines_enabled": false,
"merge_trains_enabled": false
}
],
"shared_projects": [],
"shared_runners_minutes_limit": null,
"extra_shared_runners_minutes_limit": null,
"prevent_forking_outside_group": null
}
Example API Output - Project
{
"id": 27005757,
"description": "Gitaly is a Git RPC service for handling all the git calls made by GitLab",
"name": "gitaly",
"name_with_namespace": "Baodong Cao / gitaly",
"path": "gitaly",
"path_with_namespace": "icbd/gitaly",
"created_at": "2021-05-29T03:24:28.229Z",
"default_branch": "master",
"tag_list": [],
"topics": [],
"ssh_url_to_repo": "git@gitlab.com:icbd/gitaly.git",
"http_url_to_repo": "https://gitlab.com/icbd/gitaly.git",
"web_url": "https://gitlab.com/icbd/gitaly",
"readme_url": "https://gitlab.com/icbd/gitaly/-/blob/master/README.md",
"avatar_url": "https://gitlab.com/uploads/-/system/project/avatar/27005757/gitaly7.png",
"forks_count": 0,
"star_count": 0,
"last_activity_at": "2021-07-23T14:33:44.661Z",
"namespace": {
"id": 3198322,
"name": "Baodong Cao",
"path": "icbd",
"kind": "user",
"full_path": "icbd",
"parent_id": null,
"avatar_url": "/uploads/-/system/user/avatar/2556296/avatar.png",
"web_url": "https://gitlab.com/icbd"
},
"container_registry_image_prefix": "registry.gitlab.com/icbd/gitaly",
"_links": {
"self": "https://gitlab.com/api/v4/projects/27005757",
"issues": "https://gitlab.com/api/v4/projects/27005757/issues",
"merge_requests": "https://gitlab.com/api/v4/projects/27005757/merge_requests",
"repo_branches": "https://gitlab.com/api/v4/projects/27005757/repository/branches",
"labels": "https://gitlab.com/api/v4/projects/27005757/labels",
"events": "https://gitlab.com/api/v4/projects/27005757/events",
"members": "https://gitlab.com/api/v4/projects/27005757/members"
},
"packages_enabled": true,
"empty_repo": false,
"archived": false,
"visibility": "public",
"owner": {
"id": 2556296,
"name": "Baodong Cao",
"username": "icbd",
"state": "active",
"avatar_url": "https://gitlab.com/uploads/-/system/user/avatar/2556296/avatar.png",
"web_url": "https://gitlab.com/icbd"
},
"resolve_outdated_diff_discussions": false,
"container_expiration_policy": {
"cadence": "1d",
"enabled": false,
"keep_n": 10,
"older_than": "90d",
"name_regex": ".*",
"name_regex_keep": null,
"next_run_at": "2021-05-30T03:24:28.257Z"
},
"issues_enabled": true,
"merge_requests_enabled": true,
"wiki_enabled": true,
"jobs_enabled": true,
"snippets_enabled": true,
"container_registry_enabled": true,
"service_desk_enabled": true,
"service_desk_address": "incoming+icbd-gitaly-27005757-issue-@incoming.gitlab.com",
"can_create_merge_request_in": true,
"issues_access_level": "enabled",
"repository_access_level": "enabled",
"merge_requests_access_level": "enabled",
"forking_access_level": "enabled",
"wiki_access_level": "enabled",
"builds_access_level": "enabled",
"snippets_access_level": "enabled",
"pages_access_level": "enabled",
"operations_access_level": "enabled",
"analytics_access_level": "enabled",
"emails_disabled": null,
"shared_runners_enabled": true,
"lfs_enabled": true,
"creator_id": 2556296,
"forked_from_project": {
"id": 2009901,
"description": "Gitaly is a Git RPC service for handling all the git calls made by GitLab",
"name": "gitaly",
"name_with_namespace": "GitLab.org / gitaly",
"path": "gitaly",
"path_with_namespace": "gitlab-org/gitaly",
"created_at": "2016-11-14T21:07:35.543Z",
"default_branch": "master",
"tag_list": [
"git",
"gitlab",
"rpc"
],
"topics": [
"git",
"gitlab",
"rpc"
],
"ssh_url_to_repo": "git@gitlab.com:gitlab-org/gitaly.git",
"http_url_to_repo": "https://gitlab.com/gitlab-org/gitaly.git",
"web_url": "https://gitlab.com/gitlab-org/gitaly",
"readme_url": "https://gitlab.com/gitlab-org/gitaly/-/blob/master/README.md",
"avatar_url": "https://gitlab.com/uploads/-/system/project/avatar/2009901/gitaly7.png",
"forks_count": 138,
"star_count": 269,
"last_activity_at": "2021-07-23T13:41:07.002Z",
"namespace": {
"id": 9970,
"name": "GitLab.org",
"path": "gitlab-org",
"kind": "group",
"full_path": "gitlab-org",
"parent_id": null,
"avatar_url": "/uploads/-/system/group/avatar/9970/logo-extra-whitespace.png",
"web_url": "https://gitlab.com/groups/gitlab-org"
}
},
"import_status": "finished",
"open_issues_count": 0,
"ci_default_git_depth": 0,
"ci_forward_deployment_enabled": true,
"ci_job_token_scope_enabled": false,
"public_jobs": true,
"build_timeout": 3600,
"auto_cancel_pending_pipelines": "enabled",
"build_coverage_regex": null,
"ci_config_path": "",
"shared_with_groups": [],
"only_allow_merge_if_pipeline_succeeds": false,
"allow_merge_on_skipped_pipeline": null,
"restrict_user_defined_variables": false,
"request_access_enabled": true,
"only_allow_merge_if_all_discussions_are_resolved": false,
"remove_source_branch_after_merge": true,
"printing_merge_request_link_enabled": true,
"merge_method": "merge",
"squash_option": "default_off",
"suggestion_commit_message": null,
"auto_devops_enabled": false,
"auto_devops_deploy_strategy": "continuous",
"autoclose_referenced_issues": true,
"keep_latest_artifact": true,
"approvals_before_merge": 0,
"mirror": true,
"mirror_user_id": 2556296,
"mirror_trigger_builds": false,
"only_mirror_protected_branches": false,
"mirror_overwrites_diverged_branches": false,
"external_authorization_classification_label": "",
"marked_for_deletion_at": null,
"marked_for_deletion_on": null,
"requirements_enabled": true,
"security_and_compliance_enabled": false,
"compliance_frameworks": [],
"issues_template": null,
"merge_requests_template": null,
"merge_pipelines_enabled": false,
"merge_trains_enabled": false,
"permissions": {
"project_access": null,
"group_access": null
}
}