README.md 7.67 KB
Newer Older
damien clochard's avatar
damien clochard committed
1
![PostgreSQL Anonymizer](images/png_RVB/PostgreSQL-Anonymizer_H_couleur.png)
damien clochard's avatar
damien clochard committed
2

damien clochard's avatar
damien clochard committed
3
Anonymization & Data Masking for PostgreSQL
damien clochard's avatar
init  
damien clochard committed
4
5
===============================================================================

damien clochard's avatar
damien clochard committed
6
7
`postgresql_anonymizer` is an extension to mask or replace
[personally identifiable information] (PII) or commercially sensitive data from
8
a PostgreSQL database.
damien clochard's avatar
damien clochard committed
9

10
11
12
The project relies on a **declarative approach** of anonymization. This means
we're using the PostgreSQL Data Definition Language (DDL) in order to specify
the anonymization strategy inside the table definition itself.
13

damien clochard's avatar
damien clochard committed
14
Once the masking rules are defined, you can access the anonymized data in
damien clochard's avatar
damien clochard committed
15
16
17
different ways :

* [Anonymous Dumps] : Simply export the masked data into an SQL file
18
* [Static Masking] : Remove permanently the PII according to the rules
damien clochard's avatar
damien clochard committed
19
* [Dynamic Masking] : Hide PII only for the masked users
damien clochard's avatar
damien clochard committed
20
* [Generalization] : Reducing the accuracy of dates and numbers
damien clochard's avatar
damien clochard committed
21

Nikolay Samokhvalov's avatar
Nikolay Samokhvalov committed
22
23
In addition, various [Masking Functions] are available: randomization, faking,
partial scrambling, shuffling, noise, or even your own custom function!
24

damien clochard's avatar
damien clochard committed
25
Read the [Concepts] section for more details and [NEWS.md] for information
26
about the latest version.
damien clochard's avatar
damien clochard committed
27

28
[NEWS.md]: NEWS.md
damien clochard's avatar
damien clochard committed
29
[INSTALL]: https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
damien clochard's avatar
damien clochard committed
30
[Concepts]: https://postgresql-anonymizer.readthedocs.io/en/latest/concepts/
damien clochard's avatar
damien clochard committed
31
[personally identifiable information]: https://en.wikipedia.org/wiki/Personally_identifiable_information
damien clochard's avatar
damien clochard committed
32

damien clochard's avatar
damien clochard committed
33
[Anonymous Dumps]: https://postgresql-anonymizer.readthedocs.io/en/latest/anonymous_dumps/
34
[Static Masking]: https://postgresql-anonymizer.readthedocs.io/en/latest/static_masking/
damien clochard's avatar
damien clochard committed
35
36
[Dynamic Masking]: https://postgresql-anonymizer.readthedocs.io/en/latest/dynamic_masking/
[Masking Functions]: https://postgresql-anonymizer.readthedocs.io/en/latest/masking_functions/
damien clochard's avatar
damien clochard committed
37
[Generalization]: https://postgresql-anonymizer.readthedocs.io/en/latest/generalization/
damien clochard's avatar
damien clochard committed
38
39

Declaring The Masking Rules
damien clochard's avatar
damien clochard committed
40
41
------------------------------------------------------------------------------

damien clochard's avatar
damien clochard committed
42
The main idea of this extension is to offer **anonymization by design**.
damien clochard's avatar
damien clochard committed
43

damien clochard's avatar
damien clochard committed
44
The data masking rules should be written by the people who develop the
damien clochard's avatar
damien clochard committed
45
application because they have the best knowledge of how the data model works.
damien clochard's avatar
damien clochard committed
46
Therefore masking rules must be implemented directly inside the database schema.
47

damien clochard's avatar
damien clochard committed
48
This allows masking the data directly inside the PostgreSQL instance without
damien clochard's avatar
damien clochard committed
49
using an external tool and thus limiting the exposure and the risks of data leak.
damien clochard's avatar
init  
damien clochard committed
50

51
The data masking rules are declared simply by using [security labels] :
damien clochard's avatar
damien clochard committed
52
53

```sql
54
=# CREATE EXTENSION IF NOT EXISTS anon CASCADE;
damien clochard's avatar
damien clochard committed
55

damien clochard's avatar
damien clochard committed
56
=# SELECT anon.init();
damien clochard's avatar
damien clochard committed
57

damien clochard's avatar
damien clochard committed
58
=# CREATE TABLE player( id SERIAL, name TEXT, points INT);
damien clochard's avatar
damien clochard committed
59

damien clochard's avatar
damien clochard committed
60
=# SECURITY LABEL FOR anon ON COLUMN player.name
61
-# IS 'MASKED WITH FUNCTION anon.fake_last_name()';
damien clochard's avatar
damien clochard committed
62
63
64

=# SECURITY LABEL FOR anon ON COLUMN player.id
-# IS 'MASKED WITH VALUE NULL';
damien clochard's avatar
damien clochard committed
65
66
```

67
[security labels]: https://www.postgresql.org/docs/current/sql-security-label.html
damien clochard's avatar
damien clochard committed
68

69
Static Masking
damien clochard's avatar
damien clochard committed
70
71
------------------------------------------------------------------------------

72
73
You can permanently remove the PII from a database with
`anon.anonymize_database()`. This will destroy the original data. Use with care.
damien clochard's avatar
damien clochard committed
74
75

```sql
damien clochard's avatar
damien clochard committed
76
=# SELECT * FROM customer;
damien clochard's avatar
damien clochard committed
77
78
79
80
81
 id  |   full_name      |   birth    |    employer   | zipcode | fk_shop
-----+------------------+------------+---------------+---------+---------
 911 | Chuck Norris     | 1940-03-10 | Texas Rangers | 75001   | 12
 112 | David Hasselhoff | 1952-07-17 | Baywatch      | 90001   | 423

damien clochard's avatar
damien clochard committed
82
=# SECURITY LABEL FOR anon ON COLUMN customer.full_name
damien clochard's avatar
damien clochard committed
83
84
-# IS 'MASKED WITH FUNCTION anon.fake_first_name() || '' '' || anon.fake_last_name()';

damien clochard's avatar
damien clochard committed
85
=# SECURITY LABEL FOR anon ON COLUMN customer.birth
Olleg Samoylov's avatar
Olleg Samoylov committed
86
-# IS 'MASKED WITH FUNCTION anon.random_date_between(''1920-01-01''::DATE,now())';
damien clochard's avatar
damien clochard committed
87

88
=# SECURITY LABEL FOR anon ON COLUMN customer.employer
damien clochard's avatar
damien clochard committed
89
90
-# IS 'MASKED WITH FUNCTION anon.fake_company()';

91
=# SECURITY LABEL FOR anon ON COLUMN customer.zipcode
damien clochard's avatar
damien clochard committed
92
93
94
-# IS 'MASKED WITH FUNCTION anon.random_zip()';

=# SELECT anon.anonymize_database();
damien clochard's avatar
damien clochard committed
95
96
97
98
99

=# SELECT * FROM customer;
 id  |     full_name     |   birth    |     employer     | zipcode | fk_shop
-----+-------------------+------------+------------------+---------+---------
 911 | michel Duffus     | 1970-03-24 | Body Expressions | 63824   | 12
damien clochard's avatar
damien clochard committed
100
 112 | andromach Tulip   | 1921-03-24 | Dot Darcy        | 38199   | 423
damien clochard's avatar
damien clochard committed
101

damien clochard's avatar
damien clochard committed
102
103
```

104
105
106
You can also use `anonymize_table()` and `anonymize_column()` to remove data
from a subset of the database.

107

damien clochard's avatar
damien clochard committed
108

damien clochard's avatar
damien clochard committed
109
110
111
112
113
114


Dynamic Masking
------------------------------------------------------------------------------

You can hide the PII from a role by declaring it as a "MASKED". Other roles
damien clochard's avatar
damien clochard committed
115
will still access the original data.
damien clochard's avatar
damien clochard committed
116
117
118

**Example**:

Damien Clochard's avatar
Damien Clochard committed
119
```sql
damien clochard's avatar
damien clochard committed
120
=# SELECT * FROM people;
121
 id | firstname | lastname |   phone
damien clochard's avatar
damien clochard committed
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
----+----------+----------+------------
 T1 | Sarah    | Conor    | 0609110911
(1 row)
```

Step 1 : Activate the dynamic masking engine

```sql
=# CREATE EXTENSION IF NOT EXISTS anon CASCADE;
=# SELECT anon.start_dynamic_masking();
```

Step 2 : Declare a masked user

```sql
=# CREATE ROLE skynet LOGIN;
138
=# SECURITY LABEL FOR anon ON ROLE skynet IS 'MASKED';
damien clochard's avatar
damien clochard committed
139
140
141
142
143
```

Step 3 : Declare the masking rules

```sql
damien clochard's avatar
damien clochard committed
144
=# SECURITY LABEL FOR anon ON COLUMN people.lastname
145
-# IS 'MASKED WITH FUNCTION anon.fake_last_name()';
damien clochard's avatar
damien clochard committed
146

damien clochard's avatar
damien clochard committed
147
=# SECURITY LABEL FOR anon ON COLUMN people.phone
148
-# IS 'MASKED WITH FUNCTION anon.partial(phone,2,$$******$$,2)';
damien clochard's avatar
damien clochard committed
149
150
151
152
153
154
```

Step 4 : Connect with the masked user

```sql
=# \! psql peopledb -U skynet -c 'SELECT * FROM people;'
155
 id | firstname | lastname  |   phone
damien clochard's avatar
damien clochard committed
156
157
158
159
160
----+----------+-----------+------------
 T1 | Sarah    | Stranahan | 06******11
(1 row)
```

damien clochard's avatar
damien clochard committed
161

damien clochard's avatar
damien clochard committed
162
163
Anonymous Dumps
------------------------------------------------------------------------------
damien clochard's avatar
damien clochard committed
164

damien clochard's avatar
damien clochard committed
165
166
Due to the core design of this extension, you cannot use `pg_dump` with a masked
user. If you want to export the entire database with the anonymized data, you
damien clochard's avatar
damien clochard committed
167
must use the `pg_dump_anon` command line. For example
damien clochard's avatar
damien clochard committed
168
169

```console
damien clochard's avatar
damien clochard committed
170
pg_dump_anon.sh -h localhost -p 5432 -U bob bob_db > dump.sql
damien clochard's avatar
damien clochard committed
171
172
```

damien clochard's avatar
damien clochard committed
173
For more details, read the [Anonymous Dumps] section.
damien clochard's avatar
damien clochard committed
174
175


damien clochard's avatar
damien clochard committed
176
Support
damien clochard's avatar
damien clochard committed
177
178
------------------------------------------------------------------------------

Nikolay Samokhvalov's avatar
Nikolay Samokhvalov committed
179
We need your feedback and ideas! Let us know what you think of this tool, how it
damien clochard's avatar
damien clochard committed
180
181
182
183
fits your needs and what features are missing.

You can either [open an issue] or send a message at <contact@dalibo.com>.

damien clochard's avatar
damien clochard committed
184
[open an issue]: https://gitlab.com/dalibo/postgresql_anonymizer/issues
damien clochard's avatar
damien clochard committed
185

damien clochard's avatar
damien clochard committed
186

damien clochard's avatar
damien clochard committed
187
Requirements
188
--------------------------------------------------------------------------------
damien clochard's avatar
damien clochard committed
189

damien clochard's avatar
damien clochard committed
190
This extension works with all [supported versions of PostgreSQL].
damien clochard's avatar
damien clochard committed
191

damien clochard's avatar
damien clochard committed
192
[supported versions of PostgreSQL]: https://www.postgresql.org/support/versioning/
damien clochard's avatar
damien clochard committed
193

damien clochard's avatar
damien clochard committed
194
195
It requires an extension called [pgcrypto] which is delivered by the
`postgresql-contrib` package of the main linux distributions.
damien clochard's avatar
damien clochard committed
196

damien clochard's avatar
damien clochard committed
197
[pgcrypto]: https://www.postgresql.org/docs/current/pgcrypto.html
damien clochard's avatar
damien clochard committed
198

199

damien clochard's avatar
damien clochard committed
200
201
202
Install
-------------------------------------------------------------------------------

damien clochard's avatar
damien clochard committed
203
See the [INSTALL] section
damien clochard's avatar
damien clochard committed
204

205

206
207
208
Limitations
------------------------------------------------------------------------------

damien clochard's avatar
damien clochard committed
209
210
* The dynamic masking system only works with one schema (by default `public`).
  When you start the masking engine with `start_dynamic_masking()`, you can
damien clochard's avatar
damien clochard committed
211
  specify the schema that will be masked with.
212
213
  **However** static masking with `anon.anonymize()`and [Anonymous Dumps] will
  work fine with multiple schemas.
214

215
216
217
* The [Anonymous Dumps] may not be consistent. Use [Static Masking] combined
  with `pg_dump` if you can't fence off your database from `DML` or `DDL`
  commands during the export.
218
219


damien clochard's avatar
damien clochard committed
220
221
222
Performance
------------------------------------------------------------------------------

damien clochard's avatar
damien clochard committed
223
See [docs/performances.md]
damien clochard's avatar
damien clochard committed
224

damien clochard's avatar
damien clochard committed
225
[docs/performances.md]: https://postgresql-anonymizer.readthedocs.io/en/latest/performances/
damien clochard's avatar
damien clochard committed
226
227


228

damien clochard's avatar
damien clochard committed
229