README.md 14.5 KB
Newer Older
Lætitia AVROT's avatar
Lætitia AVROT committed
1
**Do not use on a PostgreSQL production cluster. That's not what this tool is meant for!**
Lætitia AVROT's avatar
Lætitia AVROT committed
2

Lætitia AVROT's avatar
Lætitia AVROT committed
3
4
# pglog

David Fetter's avatar
David Fetter committed
5
6
7
8
9
10
11
12
13
14
15
16
17
18
`pglog` is a simple way to perform SQL queries on your log files.  Depending on
your needs, you can create foreign tables that read your csv logfiles, or you
can create real tables and load the data from the logs into them. The former is
quicker (O(1)) to start. The latter takes time to get started, O(n) in lines of
logs, and is quicker to query.

The `materialize` option copies the data into a database.  As such, it is a
snapshot of the logs as they existed when imported, i.e. does not contain
entries created later.  This is particularly helpful when analyzing log files
from another instance.  By default, `pglog` does not materialize the data.

## Prerequisites
### Configuring logs
Your logfiles need to be in the `csv` file format. See [PostgreSQL documentation
19
about logging](https://www.postgresql.org/docs/current/runtime-config-logging.html) for more information.
20

David Fetter's avatar
David Fetter committed
21
22
23
Analyzing PostgreSQL logfiles is great, but it's better if you have enough
information to play with. Therefore, I recommend setting the log parameters
with those values:
24
25
26
27
28
29
30
31
32
33
34
35

| *Parameter*                   | *Setting* |
|-----------------------------|---------|
| log_min_duration_statement  | 250 ms  |
| log_autovacuum_min_duration | 0       |
| log_checkpoints             | on      |
| log_connections             | on      |
| log_disconnections          | on      |
| log_lock_waits              | on      |
| log_temp_files              | 0       |

### Dependencies and compliance
David Fetter's avatar
David Fetter committed
36
37
38
39
The extension uses stored procedures. You need to use at least PostgreSQL 11.
This extension also uses the function `pg_ls_dir`, so the database user reading
the directory needs to have `execute` permission on that administration
function.
40

David Fetter's avatar
David Fetter committed
41
42
You can install this extension on your laptop to analyze PostgreSQL for
versions 9.0 and later, 9.0 being when the `application_name` column was added.
43

44
## Installation
45
### Pre-requisites
David Fetter's avatar
David Fetter committed
46
You need to have PostgreSQL already installed.
47
48
You need to get the extension and install it:

David Fetter's avatar
David Fetter committed
49
```sh
50
51
    git clone <pglog_project>
    cd pglog
Lætitia AVROT's avatar
Lætitia AVROT committed
52
    make install
David Fetter's avatar
David Fetter committed
53
```
54
55
56
57

### Creating the extension
Then you can create the extension

David Fetter's avatar
David Fetter committed
58
```sql
Lætitia AVROT's avatar
Lætitia AVROT committed
59
    create extension pglog cascade;
David Fetter's avatar
David Fetter committed
60
```
Lætitia AVROT's avatar
Lætitia AVROT committed
61

David Fetter's avatar
David Fetter committed
62
63
This installs the extension `file_fdw` if it's not already installed.  The
extension will create a schema named `pglog`.
64

David Fetter's avatar
David Fetter committed
65
66
## How to create the `pglog` partitioned table
The extension will create a `pglog` schema containing stored procedures and functions.
67
You need to give a file or a directory to create the foreign tables on top of
David Fetter's avatar
David Fetter committed
68
it. It will then create a `pglog` partitioned table with foreign partitions.
Lætitia AVROT's avatar
Lætitia AVROT committed
69

David Fetter's avatar
David Fetter committed
70
You can choose from among the following procedures:
71

72
73
74
75
* call pglog.create_logtable_from_file(file, directory, version[, schema][,
	materialized]);
* call pglog.create_logtable_from_dir(directory, version[, schema][,
	materialized]);
David Fetter's avatar
David Fetter committed
76
* call pglog.mylogs(schema[, materialized]); **-- Only for test purposes, not on production, please!**
77

78
## Dashboards
David Fetter's avatar
David Fetter committed
79
Based on the `pglog` partitioned table, you can generate dashboards.
80

81
###  Error report
David Fetter's avatar
David Fetter committed
82
This view counts errors in `pglog` by severity.
83
84
85
86

Example:

~~~
87
pglog=# select * from error_report;
David Fetter's avatar
David Fetter committed
88
 error_severity |   nb    | percentage
89
90
91
92
93
94
95
96
----------------+---------+------------
 FATAL          |       8 |       0.00
 ERROR          |     106 |       0.00
 WARNING        |       3 |       0.00
 LOG            | 6055776 |     100.00
(4 rows)
~~~

97
### Detail error analysis
David Fetter's avatar
David Fetter committed
98
99
100
101
The error_analyze_(warning|error|fatal|panic) views display a more detailed
analysis of the given log level. Lower severity levels tend to produce large
numbers of rows, so I recommend using appending `FETCH FIRST n ROWS` to, or
otherwise limiting, your query.
102
103
104
105
106

Example:

~~~
select * from error_analyze_fatal;
David Fetter's avatar
David Fetter committed
107
                       message                       | count |            min             |            max
108
109
110
111
112
113
114
-----------------------------------------------------+-------+----------------------------+----------------------------
 terminating connection due to administrator command |     7 | 2021-08-25 13:01:32.992+00 | 2021-08-27 09:12:44.884+00
 connection to client lost                           |     2 | 2021-08-25 12:20:24.044+00 | 2021-08-25 12:20:24.044+00
(2 rows)
~~~

### Autovacuum report
David Fetter's avatar
David Fetter committed
115
This view displays the counts of autovacuum and autoanalyze runs by table. You
116
might want to add a `limit n` part to your query so you don't display all your
David Fetter's avatar
David Fetter committed
117
tables in the result.
118
119
120
121

Example:

~~~
David Fetter's avatar
David Fetter committed
122
123
pglog=# select * from autovacuum_report limit order by vacuum_count desc fetch first 5 rows only;
 schema_name |  table_name   | vacuum_count | analyze_count
124
125
126
127
128
129
130
131
132
133
-------------+---------------+--------------+---------------
 pg_catalog  | pg_statistic  |           24 |             0
 pg_catalog  | pg_attribute  |           15 |            14
 pglog       | postgresqlwed |            8 |             7
 pg_catalog  | pg_class      |            7 |             7
 pg_catalog  | pg_type       |            6 |             6
(5 rows)
~~~

### Checkpoints
David Fetter's avatar
David Fetter committed
134
135
This view contains statistics about your checkpoints: how often they're
launched, and how long they take to complete.
136
137
138
139
140

Example:

~~~
pglog=# select * from checkpoints_stats;
David Fetter's avatar
David Fetter committed
141
 avg_checkpoint_freq | median_checkpoint_freq | avg_checkpoint_duration | median_checkpoint_duration
142
143
144
145
146
147
---------------------+------------------------+-------------------------+----------------------------
 06:01:55.491981     | 00:15:00.809           | 00:00:32.447288         | 00:00:06.6675
(1 row)
~~~

### Checkpoints by hour
David Fetter's avatar
David Fetter committed
148
149
150
Database load normally has peaks and valleys. This view shows checkpoint
statistics by hour. Once you identify the peak hours, I strongly suggest that
you craft a query to get the same stats per minute during the peak times.
151
152
153
154
155

Example:

~~~
pglog=# select * from checkpoints_stats_by_hour;
David Fetter's avatar
David Fetter committed
156
 date_part | avg_checkpoint_freq | median_checkpoint_freq | avg_checkpoint_duration | median_checkpoint_duration
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
-----------+---------------------+------------------------+-------------------------+----------------------------
         6 | 03:31:43.989333     | 00:04:59.941           | 00:00:06.375            | 00:00:04.431
         7 | 22:36:05.281917     | 00:15:00.809           | 00:00:17.00225          | 00:00:03.916
         8 | 00:21:22.5615       | 00:15:01.661           | 00:00:17.9445           | 00:00:09.3765
         9 | 00:39:21.674167     | 00:30:17.2415          | 00:00:33.430833         | 00:00:11.2735
        10 | 00:16:09.9175       | 00:16:09.9175          | 00:00:06.032            | 00:00:06.032
        11 | 00:37:25.941667     | 00:12:22.328           | 00:01:40.044667         | 00:02:29.37
        12 | 01:11:27.7846       | 00:40:00.957           | 00:01:05.7842           | 00:00:16.016
        13 | 00:15:19.55725      | 00:07:56.6295          | 00:01:18.24225          | 00:01:21.253
        15 | 04:13:58.037        | 04:13:58.037           | 00:02:30.109            | 00:02:30.109
        16 | 02:58:23.038333     | 00:25:00.427           | 00:00:05.165667         | 00:00:05.012
        17 | 00:15:00.34025      | 00:07:30.346           | 00:00:04.27425          | 00:00:03.195
        19 | 01:02:31.026        | 01:02:31.026           | 00:00:06.4115           | 00:00:06.4115
        20 | 00:09:59.441        | 00:09:59.441           | 00:00:06.241            | 00:00:06.241
(13 rows)
~~~

David Fetter's avatar
David Fetter committed
174
175
176
177
### Temp file report
Temp files are written when PostgreSQL spills to disc, which is often a signal
that `work_mem` is set too low and needs to be raised, at least for those
things causing these spills.
178
179
180
181

Example:

~~~
David Fetter's avatar
David Fetter committed
182
183
pglog=# select * from tempfile_report;
   min   |  max   | global_average |   10%   |   20%   |   30%   |   40% |  50%  |  60%  |  70%  |  80%   |  90%   |  100%
184
185
186
187
188
---------+--------+----------------+---------+---------+---------+---------+-------+-------+-------+--------+--------+--------
 8192 kB | 374 GB | 57 GB          | 5720 MB | 5720 MB | 5720 MB | 6000 MB | 32 GB | 33 GB | 36 GB | 142 GB | 144 GB | 374 GB
(1 row)
~~~

David Fetter's avatar
David Fetter committed
189
190
191
192
193
### Temp file queries
It often happens that most temp files are used by one or two specific queries.
In that case, you can set `work_mem` to a higher value when running those
queries in the narrow context of a session rather than in a wider context,
which could have unpleasant impacts elsewhere.
194
195
196

Example:
~~~
David Fetter's avatar
David Fetter committed
197
pglog=# select * from tempfile_queries order by frequency fetch first 2 rows only\gx
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
-[ RECORD 1 ]--------------+----------------------------------------------------------------------
frequency                  | 33
query_tempfilesize_median  | 142 GB
query_tempfilesize_average | 142 GB
total_size                 | 4689 GB
query                      | create materialized view pglog.top_n_queries as (                    +
                           |       with queries as (                                              +
                           |       select                                                         +
                           |         split_part(message, ?)                                       +
                           |     )                                                                +
                           |     select query,                                                    +
                           |       count(1),                                                      +
                           |       avg(duration) as average,                                      +
                           |       percentile_disc(0.5) within group (order by duration) as median+
                           |     from queries                                                     +
                           |     group by query                                                   +
                           |     order by average desc                                            +
                           |     )
-[ RECORD 2 ]--------------+----------------------------------------------------------------------
frequency                  | 154
query_tempfilesize_median  | 6000 MB
query_tempfilesize_average | 30 GB
total_size                 | 4655 GB
query                      | call create_mylogs (?,true);
~~~

224
### Top n slow queries
David Fetter's avatar
David Fetter committed
225
226
This view shows the slowest n queries in `pglog`. Narrow this down
unless you want to see all your slow queries.
Lætitia AVROT's avatar
Lætitia AVROT committed
227
228
229
230
231

Example:

~~~
pglog=# select *
232
from top_n_slowest_queries
David Fetter's avatar
David Fetter committed
233
234
order by "average" desc
fetch first 5 rows only;
Lætitia AVROT's avatar
Lætitia AVROT committed
235
-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
David Fetter's avatar
David Fetter committed
236
query   | with queries as ( select split_part(message, ? ) ) select query, count ? , avg(duration) as average, percentile_disc ? within group (order by duration) as median from queries group by query order by average desc limit ?
Lætitia AVROT's avatar
Lætitia AVROT committed
237
238
239
240
241
242
243
244
245
246
247
248
249
250
count   | 1
average | 26384.688000000000
median  | 26384.688
-[ RECORD 2 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
query   | with total as ( select count ? as total from pglog.pglog ) select error_severity, count ? as nb, round(count ? /total::numeric ? 2) as percentage from pglog.pglog inner join (values ? ? )) as severity(level,name) on pglog.pglog.error_severity = severity.name, total group by error_severity, severity.level, total order by severity.level;
count   | 1
average | 24090.557000000000
median  | 24090.557
-[ RECORD 3 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
query   | with total as ( select count ? as total from pglog.pglog ) select error_severity, count ? as nb, round(count ? /total::numeric ? 2) as percentage from pglog.pglog, total group by error_severity, total order by percentage desc;
count   | 2
average | 23530.949500000000
median  | 23210.202
-[ RECORD 4 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
David Fetter's avatar
David Fetter committed
251
query   | UPDATE pgbench_branches SET bbalance = bbalance + ? WHERE bid = ?
Lætitia AVROT's avatar
Lætitia AVROT committed
252
253
254
255
count   | 173009
average | 2.5784063256824789
median  | 1.666
-[ RECORD 5 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
David Fetter's avatar
David Fetter committed
256
query   | UPDATE pgbench_tellers SET tbalance = tbalance + ? WHERE tid = ?
Lætitia AVROT's avatar
Lætitia AVROT committed
257
258
259
260
261
count   | 173009
average | 1.9884613054812177
median  | 0.593
~~~

262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
### Session statistics
This view shows some statistics about the session duration.

Example:

~~~
pglog=# select * from sessions_stats ;
 min_duration | max_duration  |  avg_duration   | user_name | database_name 
--------------+---------------+-----------------+-----------+---------------
 00:00:00.033 | 00:41:05.38   | 00:05:32.636933 | postgres  | mybench
 00:02:41.311 | 156:33:45.673 | 43:09:49.903    | postgres  | pglog
 00:00:03.538 | 00:00:06.231  | 00:00:04.8845   | postgres  | postgres
 00:00:00.003 | 00:00:00.013  | 00:00:00.007333 | postgres  | template1
(4 rows)
~~~

### Connections report
Lætitia AVROT's avatar
Lætitia AVROT committed
279
This view shows details of connections frequency per user and database.
280
281
282
283
284
285
286
287
288
289
290
291
292
293

Example:

~~~
pglog=# select * from connections_report ;
 user_name | database_name |        frequency        | connection_number 
-----------+---------------+-------------------------+-------------------
 postgres  | mybench       | 00:23:57.488429         |                14
 postgres  | pglog         | 10 days 12:05:08.038667 |                 3
 postgres  | postgres      | 00:04:20.521            |                 1
 postgres  | template1     | 00:00:00.519            |                 2
(4 rows)
~~~

294
295
296
297
## Uninstalling
Simply use the `drop extension` statement (with or without the `cascade` option,
depending on what you want to achieve).

Lætitia AVROT's avatar
Lætitia AVROT committed
298
299
## License
This project is under the PostgreSQL license.