Collect 15-30min of Postgres logs with log_min_duration_statement = 0 for analysis and DB experiments
These actions are part of issue infrastructure#4921
As discussed with @abrandl and @dawsmith, I'm going to collect a sample (30-60 minutes) of all-queries log on Wednesday, Sep 19, starting at 12:00 UTC (the busiest time is ~13:00 UTC).
The brief description of actions
---
To do that, I'll setup `log_min_duration_statement` with manual `alter system set ..` command on the master, followed by `select pg_reload_conf();`. The plan (see details below) is to reach 0 iteratively, descending from the current value of 1 second to 0 by steps, with ~15 minutes apart, controlling IO and the log size.
Projected IO impact
---
The projected IO impact, based on `pg_stat_statements` view (it lacks query parameters, but this just adds some small error to the forecast – there are no huge Geo queries on the master node), based on last 10 hours of observation:
```sql
gitlabhq_production=# select now(), :'TS_PGSS_RESET' as pg_stat_statements_last_reset, now() - :'TS_PGSS_RESET' since_reset, sum(calls * length(query)) as total_bytes, sum(calls * length(query)) / extract(epoch from now() - :'TS_PGSS_RESET') as bytes_per_sec from pg_stat_statements;
-[ RECORD 1 ]-----------------+------------------------------
now | 2018-09-18 07:51:16.052095+00
pg_stat_statements_last_reset | 2018-09-17 21:19:59.557029+00
since_reset | 10:31:16.495066
total_bytes | 37407827046
bytes_per_sec | 987626.41529573
```
– this gives an estimate of ~1MB/s for writing Postgres log. This is based on not the busiest hours though, so it's subject to checking during Tuesday mid-day. This is an estimate for `log_min_duration_statement = 0`. For higher values, including the possible lowest positive `1ms`, the excepted IO will be much lower.
Additional observation: a lot of queries under 1ms
---
If we take Top-15 (by `total_time`) queries among those which `max_time` is < 1mas, we'll see that those take >50% of time:
```sql
select
total_time,
sum(total_time) over () as total_time_registered_in_pss,
round(100::real * (sum(total_time) over (order by total_time desc)) / sum(total_time) over ())::text
|| '%' as accum_percentage,
min_time,
round(mean_time::numeric, 2) as mean_time,
max_time,
query
from pg_stat_statements
where max_time < 1
order by total_time desc
limit 15;
```
The detailed plan of actions
---
- [x] 0) Tuesday, 2018-09-18 12:00 UTC: reset pg_stat_statements and double check the write IO estimate.
In psql on Postgres master:
```sql
select now() as "TS_PGSS_RESET" from pg_stat_statements_reset() \gset
select
now(),
:'TS_PGSS_RESET' as pg_stat_statements_last_reset,
now() - :'TS_PGSS_RESET' since_reset,
sum(calls * length(query)) as total_bytes,
sum(calls * length(query)) / extract(epoch from now() - :'TS_PGSS_RESET') as bytes_per_sec
from pg_stat_statements;
\watch 300
```
- [x] 1) Wednesday, 2018-09-19 11:xx Preparations.
Set millisecond precision in log timestamps – use `%m` instead of `%t` (currently we have second-level precision, `%t`):
```
alter system set log_line_prefix = '%m [%p]: [%l-1] db=%d,user=%u ';
select pg_reload_conf();
```
- [x] 2) Wednesday, 2018-09-19 12:00 UTC: start descending.
In psql on Postgres master:
```sql
alter system set log_min_duration_statement = '100ms';
select pg_reload_conf();
```
- [x] 3) Observe the current size of the log and avg speed of its growth:
```shell
sudo ls -lah /var/log/gitlab/postgresql/current
# this will print avg bytes/sec based on the full "current" log file
sudo head -n1 /var/log/gitlab/postgresql/current | head -c 22 | sed 's/_/T/g' | xargs date +"%s" --date \
| awk \
-v date="$(date +"%s")" \
'{print (date - $1)}' \
| awk \
-v bytes="$(sudo wc -c /var/log/gitlab/postgresql/current | awk '{print $1}')" \
'{print (bytes / $1)}'
```
And check IO with:
- `sudo iotop -o`
- monitoring
- [x] 4) Wednesday, 2018-09-19 12:15 UTC: continue descending, set to '10ms'
In psql on Postgres master:
```sql
alter system set log_min_duration_statement = '10ms';
select pg_reload_conf();
```
+ continue observing as described in step 3
- [x] 5) Wednesday, 2018-09-19 12:30 UTC: continue descending, set to '5ms'
In psql on Postgres master:
```sql
alter system set log_min_duration_statement = '5ms';
select pg_reload_conf();
```
+ continue observing as described in step 3
+ check IO with `sudo iotop -o` and monitoring (https://dashboards.gitlab.net/d/pEfSMUhmz/postgresql-disk-io?orgId=1 + GCP graphs)
- [x] 6) Wednesday, 2018-09-19 12:45 UTC: continue descending, set to '1ms'
In psql on Postgres master:
```sql
alter system set log_min_duration_statement = '1ms';
select pg_reload_conf();
```
+ continue observing as described in step 3
+ check IO with `sudo iotop -o` and monitoring (https://dashboards.gitlab.net/d/pEfSMUhmz/postgresql-disk-io?orgId=1 + GCP graphs)
- [x] 7) DECISION TO BE MADE BASED ON OBSERVED NUMBERS: If the current IO caused by logging is > 1MB/s, stop with `1ms`, and collect only partial logs, with this threshold.
If it's below, continue with last step of descending to collect all-queries log:
```sql
alter system set log_min_duration_statement = 0;
select pg_reload_conf();
```
- [ ] 8) Collect logs (minimum 15 minutes, max 30 minutes):
```shell
sudo cp /var/log/gitlab/postgresql/current ~nikolays/postgres_0_20180919.log
```
- [x] 9) Return to the initial state - in psql on the master node:
```sql
alter system reset log_min_duration_statement;
alter system reset log_line_prefix;
select pg_reload_conf();
\c
show log_min_duration_statement;
show log_line_prefix;
```
Notes on possible risks
---
No risks to have any performance degradation are expected. In case of any unpredictable IO, the step 8 will be immediately applied.
It expected, that ~2-4GB of logs will be generated during collection phase (with projected generation speed ~1-2MB/s), which is an equivalent to ~10 days of logging with current `log_min_duration_statement` threshold (1s)
This might cause some delays in Kibana processing.
TODO after these works are done
---
After this is complete, TODO (separate issues):
- process, analyze logs with pgBadger
- PoC usage of logs with pgreplay on experimental DB nodes
- `set log_min_duration_statement to 100ms;` permanently
- set better `log_line_prefix`
- stop using "one for all" DB roles, use separate ones for different apps and for administration.
/cc @gitlab\-com/gl\-infra
issue