Commit 266f80d3 authored by Paul 🐻's avatar Paul 🐻

Merge branch 'pgday-paris-2017' into 'master'

blog: pgday 2017

See merge request !4
parents 1092a51c bffd5199
Pipeline #7212273 passed with stages
in 2 minutes and 37 seconds
title: PgDay Paris 2017
date: 2017-03-23
tags: conference, pgday, Paris, postgresql
skip_to_link: { talks: "Skip directly to the talks transcripts 📝" }
PostgreSQL is known to be a very good relational database system. Because it is robust, because it has all the features than modern SGBD databases systems expects. On top of that it is an open source and free software with a very wide community behind it. This community is filled with great people trying to make the postgreSQL ecosystem as good as possible.
Today I had the chance to attend the [PgDay Paris 2017]({:target="_blank"} event and it was a very nice day. I tried to take some notes about the talks and wanted to share with you what I saw. Most of the talks have [slides available online]({:target="_blank"} but if you want more detail about what was said and what I captured during this event please follow the page bellow :){:target="_blank"}.
## Small conference for better encounters
This is a great [one day conference]({:target="_blank"}. It is usually a cheap event, with great speakers from the PostgreSQL community around the world and is held entirely in English. If you have the chance to attend you should come! This year's 2017 edition was my second []({:target="_blank"} and I was verry happy to be part as a spectator.
Huge thanks to the Staff organising the event! [@shaysler]({:target="_blank"}, [@pg_krol]({:target="_blank"}, [@pg_xocolatl]({:target="_blank"}, and Katharine Atwood.
## Talks
* [Kaarel Moppel](#a-pg-dbas-toolbelt)
* [Christphe Pettus](#corruption-war-stories)
* [Sylvie Aune](#cant-imagine-what-a-newbie-can-do-with-pg-on-linux-in-a-month)
* [Fabien Coelho](#measuring-and-reducing-postgres-transaction-latency)
* [Magnus Hagander](#a-look-at-the-elephants-trunk---pg-10)
* [Steven Simpson](#infrastructure-monitoring)
* [Joe Conway](#mls-postgres-and-selinux)
* [Simon Riggs](#replication--recovery-in-pg-10)
# A PG DBAs Toolbelt
* Kaarel Moppel (Cybertec)
## Setup
* **PG documentation is great**. Always try to know the concepts behind the tools.
* **PGtune**: outputs a pg.conf file from an existing one depending of your hardware. Good resources for that are also available on postgres' wiki at the [Performance Optimization]( page and the [Category:Benchmarking]( one.
* `pgbench`: Benchmark your PG installation. It measures throughput in **TPS**. Bundled into pgserver binary. Custom workloads + probabilities (of writes / reads).
* pgbench-tools: wrapper around pgbench with nice graphs and summaries.
* Backups!
* Wal-e (cloud oriented). Provide your cloud access keys directly they will do the rest.
* **PGbarman**. Incremental backup, WAL archiving...
* PGbackrest
## Monitoring
* Basics linux: top/**htop** and iotop/dstat.
* **pg_activity**. Extra python binary. Top-like app for pg. Very useful to see blocking queries and quickly kill them.
* **pg_view**. Extra binary. Activity view with very detailed info. Really nice.
* pgcenter. Extra binary. Similar but **lots** of features. Top-like but also, config update, I/Os, etc
* `pg_stat_statements` is a pg contrib extension with lots of metrics.
## Monitoring frameworks
* Any monitoring solutions usualy has a pg extension (new relic, zabbix, nagios, whatever...)
* integrates with Nagios:
* `check_postgres`
* `check_pgactivity`
* **PgObserver**: remote data collector (stores in extra db) and has graphs tables views. No need for extensions on the server.
* **pghero**: health checks. Suggested indexes.
## Common data tools
* Text snippets: from the []( there are lots of query snippets to help you find duplicate rows, data dependencies, index problems...
* **pgBadger**: log analyser. Produces html static pages: slowest queries, sessions, locking, tps, temp files, parallel...
* supports pgbouncer log files
* Misc solutions:
* Cloud provider plugins: loggly, splunk
* redislog for ELK stack
* Push logs into extra Postgres tables! Then just query with PG :).
## Indexing
* **hypopg**: build "hypothetical" indexes to see if it will be usefull (after an `EXPLAIN`). Good to check if postgres will use the index before even building it.
* **pghero**: index sugestions
## Bloat
* `pgstattuple` pg extension.
* `check_pgactivity`
* [**pg_repack**]( extension. Used for years. It cleans bloated data in tables without locks. "better vacuum full"
* **pg_squeeze**: similar than pg_repack but without triggers (with replication slots and logicial decoding).
## Navigating
* `psql` of course. Don't forget to check `.psqlrc` config file (for shortcuts, custom display improvements)
* pgcli (python bin - slower than psql).
* Graphical:
* pgadmin
* DataGrip (commercial product)
* SqlPad
## Data integrations
* Foreign Data Wrappers FTW.
* **pgloader**: "1-click" MySQL migrations, or CSV
* postgrest
* **ngx_postgres**: nginx extension to interact with postgres
## For developers
* pgTap: unit testing for postgres
* sqitch: migration management
## Scaling
* pgbouncer (connection pooling)
* pgpool-II
[↑ Back to the list of talks](#talks)
# Corruption War Stories
* Christophe Pettus (PostgreSQL Experts, Inc)
* [@xof](
## it will happen /o\
* backup
* even good backup can have hidden long-term corruption
* can be too old
* or hit by a PG bug
## Preventing corruption
* how to fix? Don't have one in the first place.
* PG assumes FS is perfect, but it's not:
* can not recover from silent bad data write
* with 9.3 checksums, you at least have warnings
## Hardware is cheap. But data is expensive
* Use good-quality hardware.
* HW needs to honor fsync END TO END. (Manufactors do hide infos)
* avoid network devices for `$PGDATA` and backups
## Technic, Process
* be sure to follow the right protocol
* test this protocol.
* test your backups. **Obviously**. backups are not backups if they are not tested.
## Find corruptions
* Try out a **pg_dump to /dev/null**. You'll be sure it will read every single row.
* beware about I/Os of course
* great for finding corrupt data
## HW failures
* bad disk, bad controller
* garbage writes during power loss
* bad ram (**non-ECC ram**)
## HW features
* deferred or missing fsync behavior. HW vendors lie...
* network storage that does not handle detach gracefully.
* soft-RAID edge conditions
## PG bugs
* 9.2 and 9.3 had unfortunate replication bugs.
* 9.6.1 index bug
## Operator error
* forget external tablespaces
* `rm -rf` wrong directory. (💚 for Gitlab)
* Special 👏 👏 to Gitlab for **their transparency** and for the **gold-mine** info in [their postmortem](
## What to do?
* Buy good hardware. Demand your HW provider what your disks are.
## Basics
* Save all parts:
* stop pg
* do a **full filesystem level beckup**.
* keep it safe
* document each step
## Again: use pg_dump
## Bad data page
* checksum failures
* can you pg_dump the table?
## Really bad data pages
* Can you SELECT those pages?
## Desperate cures
* all corruptions are by nature "one-off" situation.
* No recipes to recover. You need to analyse.
## Check system logs
* mdadm, **dmesg**...
* **oom killer** terminating backends?
## WAL files corrupt or missing
* disk space pb
* remove "logs"
* what?
* you mean "pg_xlogs"
* :facepalm:
## pg_clog
* pg status of transactions
* pg_clog corruption is rarely subtle.
* usualy a missing file
* easy to patch: just add a zeros file. (00 means "in progress")
## system catalog corruption
* horrible. Nightmare
* very hard to recover from
## Real life corruptions stories
### pg_clog corruption
* network connectivity issue
* secondary promoted
* secondary not as powerful as primary. Couldn't handle the load
* back to the primary
* network issues again
### upgrade
* secondary promoting from primary
* existing rows are missing
* no errors
* PG bug
* 9.3 bug
* clog values were not properly being transmitted from primary to secondary under HIGH-write-load conditions.
#### the fix
* enough info to delete the bad rows
#### Moral
* do **thorough sanity checks** on promoted primaries
### Other moral
* DON'T use desktop hardware
* even old backups can be useful
### Too many autovacuums going on
* because of configuration (`autovacuum_freeze_max_age` very high)
* wraparound warnings in the logs
* **no monitoring**: didn't notice.
* needed to manually vacuum oldest tables.
#### Moral
* don't crack up the autocavuum freeze conf
* **MONTIOR errors and warnings**.
## TL;DR
* Good hardware.
* **Test your backups**
* Stay up to date on **release notes of PG** and apply upgrades promptly.
* **MONITOR your log** output
* Get plenty of rest
## Q/A
* Can we rely on FS checksums? I'd say yes but don't have arguments for specific filesystem.
* tools to check fsync? pg_test_fsync included in PG can check fsync for you.
* How to check backups? I use barman, backups + restore and pg_dump the restored database. Is it enough?
[↑ Back to the list of talks](#talks)
# Can't imagine what a newbie can do with PG on Linux in a month
* Sylvie Aune (teacher and working SA Infoconsulting)
## IPrec
* projects to help get trainings for newbies from very technical professionals.
## Beginnings (our first test)
* install and deploy FREE inventory management ool
* present project after ONE month.
* a project manager (with no time to help)
* 4 newbies (with two very motivated by the challenge)
* One IT trainer with good linux and db skills.
## Good for newbies
* easy enough to get 4 people that didn't know anything about PG to get starting
## Newbies got something out of the training program
* Know happy and continue to learn about it
* Useful knownledge in their dayly jobs
## TL;DR
> It is Not Because Things are Difficult that We Do Not Dare; It Is Because We Do Not Dare that They are Difficult.
> ~Seneca~
[↑ Back to the list of talks](#talks)
# Measuring and Reducing Postgres Transaction Latency
* Fabien Coelho (Teacher at École des Mines de Paris)
## OLTP - Online Transaction Processing
* CRUD queries
* data fit in shared buffers
* Focus on performance, with **emphasis on LATENCY** (not only TPS)
## Typical 3-tier web app
* User ↔ Client ↔ **App Server ↔ Database**
* focused on last part (**app server ↔ db**)
* connection
* req-res cyckes
## Transaction perf measures
* troughput **(tx/sec)**
* Latency. For one transaction how long did it take? **(ms/tx)**
* Both are **correlated** (and contradictory!). Better throughput usually leads to worst latency
* **max vs enough**. Think about a MAX latency you don't want to bypass before thinking about huge TPS.
* Bottleneck (network, cpu, ios, mem..) for throughput. For latency it depends on the **base transaction time**. If your transaction is 10 ms and you improve network connectivity by 1 ms you get a 20 % improvement. If the transac is 2ms you get a 100% improvement!
## PG perf swiss army knife
* input, options parallelism, output
* **pgbench**
* tests needs to be long enough (warmup, checkpoint, vacuums...)
* **multiple times**. To be able to reproduce.
## Connection costs
* connection AAA **1/3rd** of the time
* SSL handskae **1/3rd**
* Transaction **1/3rd**
→ of course avoid reconnections for each transaction.
## Comparing 9.5 vs 9.6
* `pgbench -j 4 -c 8`
* 329.4 tps vs 326 tps
* 24.3ms vs 24.4ms
* latency std deviation: 79.5ms vs 20ms
* Instant TPS has completly different behavior. Graphed each second.
* 9.5 has lots of seconds where it has to wait (where ~0 tps).
* 9.6 is smoother (but has overal lower tps)
## Why?
* 9.5 has checkpoints.
* data writes spread over time == **random I/O**
* OS chosses when to actually write (30s delay on linux)
* until... fsync is called == **I/O storm**
* very expensive hardware
* use PG 9.6! \o/
* sorted data writes == **sequential I/O**
* flush instructions sent regularly (256kb) `checkpoint_flush_after`
* when fsync is called == **everything is already written**, great!
## Rate vs Limit
* in pgbench you have two options `-R` and `-L` for max rate and max limit.
* 9.5
* slow & skipped 24%
* latency 15.6 ± 158ms
* 9.6 with checkpoint
* slow & skipped 2.7%
* latency 3.6 ± 24ms
* 9.6 with sorted data
* Even better!
## Play with FILLFACTOR storage param
* PG: UPDATE = DELETE + INSERT. This can actually happen on multiple different pages and **at worst up to 3 changes**.
* doesn't change perf that much changing the fillfactor.
## Hardware
* Hard Disk Drive **HDD** vs **SSD**
* mech vs elec
* fast seq I/O for both
* **slow random** I/O vs **fast random** I/O
* 406 tps vs **4700 tps**
* 19.7 ±12.3ms vs **1.7 ± 2.4ms**
## UNLOGGED tables (mysql style - data at risk)
* 10x perf imprivement
* ⚠️️ ⚠️️ but **don't do it** you can loose data.
## RO test (in-cache)
* queries on 3 tables
* transfers (network)
* parse query (syntax analysis)
* plan query
* exec query
* SSL cost. Time & €
* nego and re-nego
* crypto functions
* certificate
* Benefits: **CIA (Confidentiality, Integrity and Authentication)**
* **10% overhead with SSL on**
* simple vs prepared statements
* temp one-cmd function
* factor out `parse` cost
* keep `plan` and `execute`
* pgbench `-M prepared`
* 20% improvement with `prepared`. Good idea if things are repeated.
## Query combination trick
* Update, then SELECT. Can be replaced by `UPDATE ... RETURNING`
* No better perf.. :( sometimes it works.
* combined queries (with `\;` `BEGIN; multiple SELECT \; COMMIT;`)
* **x2 perf**! on both tps and latency
* **x2 perf** (roughly same as combined queries)
* PL/pgSQL
* **x3 perf**!!!
* why?
* with PL/pgSQL the **planning time is cached**. and done only once at first function call!
## Client-server distance
* LAN local network
* LO loopback
* IPC inter-process coms
* 400 vs 1133 vs 1243 tps
* 2.4 vs 0.9 vs 0.8 ms
## Scalibility
* base: RO
* remote SSL, simple queries
* best throughput **37 639** tps (for 156 clients connected)
* best latency 1.000 ms (for 6 clients)
* compromise: **31 000 tps 1.837ms** (for 58 clients)
* remote, noSSL, prepared, PL calls
* Best **181 000** tps (140 clients)
* Best 0.254ms (10 clients)
* Compromise **156 945 tps 0.3ms** (60 clients)
## Misc settings
* App level
* connection persistence
* no ssl
* OS level
* Filesystem: **XFS**, ext4, ~ZFS~ (he's not a fan because PG is already doing semantics), care about mount options
* IO
* HW level
* better harware (so more €))
* cache units...
## TL; DR
* no TPS only. **Latency matters**.
* DO MEASURE perfs. **pg_bench is there and very easy** to write custom scripts. So use it please :). It's being improved constantly.
* use PG 9.6.
* Costs. A lot of time is spent on network, `parse` & `plan`.
* RW load: go for SSD for sure.
* RO load: HDD == SSD
## Q/A
* Checkpoint Completion target (pgbench param) have you changed it?
* Yes I did change it... :p.
* Parallel queries? What is the impact?
* Good for aggregation. For multiple clients in parallel it doesn't change much.
[↑ Back to the list of talks](#talks)
# A look at the Elephant's Trunk - PG 10
* Magnus Hagander (redpill linpro - sweden)
## New era of versioning
* major used to be x.4, x.5...
* new majors will be 10.x, 11.x Full digits !
## Schedule of PG 10
* Master is always next major version
* August 2016 branch 9.6
* CF (Commit Fest) Sept CF1, Nov. CF2, Jan 2017 CF3, March 2017 CF4 in progress
* ~ 1300 commits by now == a lot of new features
## Go go go, New features
## Small things
* drop support for protocol 1.0
* drop support for floating point timestamps. (compile flag)
## Bigger things
* `pg_xlog``pg_wal` renaming /o/
* `pg_clog``pg_xact` renaming /o/
* all functions that had `xlog` in the name are renammed to `wal` at the SQL level. E.g. `pg_receivexlog` is now `pg_receivewal`
* same thing for all cli parameters. E.g. `pg_basebackup --xlog-method``--wal-method`
## good news (client)
* `\gx` in psql shows extended display for single queries.
## SCRAM authentication
* Salted Challenge Response Authentication
* standardized way to auth
* more secure than md5 (easy :p)
* switch WHEN your clients support it.
* easy upgrade as soon as you control your clients.
## libpq enhancements (client)
* multiple hosts can be specified `host=pg1,pg2,pg3`
* `host=pg1,pg2,pg3 target_session_attrs=read-write` be sure your rw host will receive the queries
* `.pgpass` overridden
## Monitoring
* `pg_stat_activity` will now include walsender activity. With `application_name: walreceiver` and `query: walsender`
* new wait events
* latches
* I/O events
* reads
* writes
* individually identified
## Developer and SQL features
* `regexep_match()`
* just like `regexp_matches()` but not SRF but easier AND faster if you need one match.
## PL/python
* now supports multi-dimensional arrays
## Data ingestion
* `file_fdw` can now use a program. _Outch shell injection?_
* COPY VIEW FROM is now supported as long as INSTEAD OF is used
* (almost) per SQL standard
* convert XML document to resultset.
* mapping xpath: Converts a whole XML document (with xpath filters) into columns.
* much faster than individual queries
## Backup and replication
## new defaults
* new postgresl.conf defaults
* `wal_level = replica` /o/ :clap: :clap:
* `max_wal_senders = 10`
* and `max_replication_slots = 10`
* new pg_hba.conf defaults
* allow replication connections by default
## Replication slots
* temporary repli slots. VERY good for backup operations for example.
* 'cause it automatically dropped at end of session
* prevents fall-behind of filling disk risk
## pg_basebackup
* WAL streaming supported in tar mode (`-Ft`)
* better excludes
* new defaults
* WAL streaming now default (`-X`)
* uses temporary replication slot by default
## pg_receivewal
* supports compression
## quorum based sync repli
* `synchronous_standby_names = ANY 2 (pg1, pg2, pg3, pg4)`
## logical replication
* based on WAL
* and logical decoding
* replicate individual objets
* new `SUBSCRIPTION`. `CREATE SUBSCRIPTION testsub CONNECTION 'host=pg1 port=...' PUBLICATION testpub;`
* ⚠️️ still work in progress
* ⚠️️ still no initial data copy yet.
* ⚠️️ and no sequences
* ⚠️️ not suitable for fail-over.
## postgres_fdw
* push down aggregations
## Hash indexes
* YOU CAN START using them
* now has WAL logged
* no useful!
* many perf enhancements
* better caching
* page level vacuuming
* sometimes better than btree!
* ⚠️️ sometimes don't work
## Partitioning
* based on existing inheritance
* **MUCH easier to work with.**
* automatic tuple routing
* more limitations → more optim (soon)
* range partitioning (single column only)
* list partitioning
* ⚠️️ still many limitations
* no row-movement
* no cross-partition indexes
* no cross-partition keys
* no partition-wise processing
* no tuple routing for foreign partitions
## More Parallelism
* 9.6 introduced parallelism
* seq scans
* aggregates
* hash and loop joins
### Usability
* new param `max_parallel_workers`
* query string now in workers
* shows in pg_stat_activity
### Subqueries
### index scans
### merge joins
* already a lot
* but we're not done yet
* ⚠️️ extended statistics
* ⚠️️ hash support for grouping sets
## Still lots to do
* lots of smaller fixes
* perfm improvements
* etc
## Please help! Please
* **we need more people**
* if you can **review C code**, please come to Commit Fest
* if not, please **run postgresql 10**! Test it NOW!
* apt, npm packages available often updated
[↑ Back to the list of talks](#talks)
# Infrastructure Monitoring
* Steve Simpson (Stack HPC)
* from Bristol, UK
* Systems Software Engineer
* currently working for Stack HPC
* multi tenant
* Handles cambridge university's 100 node HPC openstack cluster