Verified Commit 6438195c authored by MrMan's avatar MrMan

Condense the talk even more

parent 31f5b6e3
---
title: Just use Postgres
author: Victor Adossi
theme: Madrid
fonttheme: professionalfonts
date: June, 2019
---
# Roadmap #
- What is Postgres?
- Why/Why not Postgres?
- Key/Value Data
- Document Storage
- Geospatial Data
- Message Queue
- Time Series Data
# What is Postgres? #
**Postgres is the most advanced open source database that's ever existed**. It's developed in the open, driven and maintained by the community.
There are a few large contributors in the space like **2nd Quadrant** and **Citus Data** (acquired by MS in January).
Some of the features that set postgres apart:
- Multi Version Concurrency Control (MVCC)
- Plugin system (indices, functionality)
- Process-per-connection model
- Elephant mascot
Reddit got to 1 billion users on a master-slave Postgres (scaling up rather than out, mostly)[^1]
[^1]: http://highscalability.com/blog/2013/8/26/reddit-lessons-learned-from-mistakes-made-scaling-to-1-billi.html
# Why/Why not Postgres? #
\small
**Reliability** - Postgres is rock solid
**Performance** - "fast enough" to *pretty darn fast*
**Cloud vendor support** - AWS RDS, Azure Database, GCP Cloud SQL
**Open Source** - You can see how it works
\normalsize
Why *not* Postgres?
\small
**No Vendor** - No vendor to call[^2]
**Scaling Out** - No official scale out story[^3]
**Learning Curve** - Structured Query Language (SQL) can be difficult
**Rigor** - Transactional guarantees can eat into performance
[^2]: Lots of consultancies though (like 2nd Quadrant) which can help out
[^3]: PostgresXL does exist
# Key/Value Data #
Postgres makes a surprisingly good simple key value store. You're not going to beat Redis, but it's *probably* going to be fast enough!
```sql
CREATE UNLOGGED TABLE kv (
id serial GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
key text NOT NULL,
value jsonb,
created_at timestamptz NOT NULL DEFAULT NOW()
);
```
 
Pluggable storage engines (the table access interface)[^4] has landed, you could *actually* put Redis in your Postgres
[^4]: https://www.postgresql.org/docs/devel/tableam.html
# Document Storage #
\small
```sql
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE TABLE docs (
id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
data jsonb,
updated_at timestamptz NOT NULL DEFAULT NOW(),
created_at timestamptz NOT NULL DEFAULT NOW()
);
-- GIN indexes massively speed up searches like:
-- SELECT * FROM docs WHERE data @> {"some_key": "some_value"}
CREATE INDEX docs_data_idx ON docs USING GIN (data);
```
\normalsize
Look into Postgres's full range of JSON operators[^5]. SQL/JSON (JSONPath for SQL) is coming in 12[^6].
[^5]: https://www.postgresql.org/docs/current/functions-json.html
[^6]: https://www.postgresql.org/docs/12/functions-json.html#FUNCTIONS-SQLJSON-PATH
# Geospatial Data #
Geographic Information System (GIS) data is the bread and butter of PostGIS[^8]:
 
```sql
SELECT superhero.name
FROM city, superhero
WHERE ST_Contains(city.geom, superhero.geom)
AND city.name = 'Gotham';
```
 
Feature set and documentation for PostGIS is *extensive*.
[^8]: https://postgis.net
# Message Queues #
If all your application instances are connected to the database, why not have them communicate?
\small
```sql
-- Create a channel named "virtual"
LISTEN virtual;
-- Notify with no payload
NOTIFY virtual;
-- notify with payload
NOTIFY virtual, 'This is the payload';
```
 
\normalsize
Maybe you don't need a NATS/RabbitMQ/NSQ/Kafka cluster *just* yet.
Want to go deeper? Try combining this feature with some `UNLOGGED` and `TEMPORARY` tables and build some data pipelines.
# Time Series Data #
You could build your own solution by using `PARTITION`s, `UNLOGGED` tables, some `TRIGGER`s, but don't bother. Just use TimescaleDB[^9].
![TimescaleDB insert performance on 1B inserts](timescale-vs-postgres-insert-1B.jpg){ height=50% }
# Time Series Data (continued) #
TimescaleDB compares favorably to MongoDB[^10] and InfluxDB[^11].
![](timescale-vs-influx.png){ height=60% }
[^9]: https://docs.timescale.com/v1.3/introduction
[^10]: https://blog.timescale.com/how-to-store-time-series-data-mongodb-vs-timescaledb-postgresql-a73939734016
[^11]: https://blog.timescale.com/timescaledb-vs-influxdb-for-time-series-data-timescale-influx-sql-nosql-36489299877
# So What? #
Postgres may not be the best solution to your problem, but it's very often **good enough**.
 
Before introducing a new piece to your infrastructure, consider using your Postgres database to solve the problem.
# The End #
Thanks for listening
# whoami
If you've got any corrections, complaints, or comments, feel free to reach me using the information below:
Victor Adossi ([email protected], [email protected])
GPG: ED874DE957CFB552
I run a couple very small consultancies to support businesses in Japan and the USA:
- GAISMA G.K. (https://gaisma.co.jp)
- VADOSWARE LLC (https://vadosware.io)
Need help figuring out *how* you're going to use Postgres in your infrastructure? I can help with that.
# Bloopers: Hot takes and tips #
A bunch of things I think that are probably right:
- Use Gitlab
- Don't write ECMAscript (AKA Javascript) without Typescript
- Try Lisp & Haskell (separately?) at least once
- Try Rust more than once
- Never price by project*
- Don't build & deploy VMs on a greenfield project in 2019**
 
\scriptsize
\* Unless you've built the thing already and you are literally going to reskin it and the client has absolutely *no* new feature requests.
\** Unless your VM in production is basically Container Linux
......@@ -10,11 +10,9 @@ date: June, 2019
- What is Postgres?
- Why/Why not Postgres?
- Relational Data
- Key/Value Data
- Document Storage
- Geospatial Data
- Log Storage
- Message Queue
- Time Series Data
......@@ -65,23 +63,6 @@ Why *not* Postgres?
[^3]: PostgresXL does exist
# Relational Data #
99% of the data your organization needs to deal with is going to be relational -- most data isn't very useful without context.
 
```sql
SELECT company_name,amount,payment_status
FROM customers
JOIN invoices ON customers.id = invoices.customer_id
WHERE payment_status=='not-paid';
```
 
Postgres has `ENUM`s and custom `TYPE`s, and advanced `CONSTRAINT`s and `TRIGGER`s, tools you can use to make sure your data is *valid*, and *correct* data.
# Key/Value Data #
Postgres makes a surprisingly good simple key value store. You're not going to beat Redis, but it's *probably* going to be fast enough!
......@@ -145,30 +126,6 @@ Feature set and documentation for PostGIS is *extensive*.
[^8]: https://postgis.net
# Log Storage #
Declarative partitioning means Postgres can take your gobs of structured logs (they *are* structured right?)
\small
```sql
-- The partitioned table
CREATE TABLE logs (
data jsonb NOT NULL,
logged_at timestamptz NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (logged_at);
-- A partition for the month of June
SET TIME ZONE 'Asia/Tokyo';
CREATE TABLE logs_2019_06
PARTITION OF logs
FOR VALUES FROM ('2019-06') TO ('2019-07');
```
\normalsize
Some assembly/maintenance *is* required, but faster queries on smaller data sets (constraint exclusion) has never been cheaper.
# Message Queues #
If all your application instances are connected to the database, why not have them communicate?
......
.PHONY: all \
2019-04-mercari-dev-meetup 2019-04-mercari-dev-meetup-pdf 2019-04-mercari-dev-meetup-condensed-pdf \
2019-06-tokyo-tech-meetup 2019-06-tokyo-tech-meetup-pdf 2019-06-tokyo-tech-meetup-condensed-pdf 2019-06-tokyo-tech-meetup-watch
2019-06-tokyo-tech-meetup 2019-06-tokyo-tech-meetup-pdf 2019-06-tokyo-tech-meetup-condensed-pdf 2019-06-tokyo-tech-meetup-condensed-more-pdf 2019-06-tokyo-tech-meetup-watch
all: 2019-04-mercari-dev-meetup 2019-06-tokyo-tech-meetup
......@@ -33,7 +33,7 @@ ENTR ?= entr
-s 2019/04/mercari-backend-meetup-condensed.md \
-o dist/2019/04/mercari-backend-meetup-condensed.pdf
2019-06-tokyo-tech-meetup: 2019-06-tokyo-tech-meetup-pdf 2019-06-tokyo-tech-meetup-condensed-pdf
2019-06-tokyo-tech-meetup: 2019-06-tokyo-tech-meetup-pdf 2019-06-tokyo-tech-meetup-condensed-pdf 2019-06-tokyo-tech-meetup-condensed-more-pdf
2019-06-tokyo-tech-meetup-watch:
find 2019/06/* | $(ENTR) -rc make 2019-06-tokyo-tech-meetup
......@@ -59,3 +59,14 @@ ENTR ?= entr
--self-contained \
-s 2019/06/tokyo-tech-meetup/just-use-postgres.condensed.md \
-o dist/2019/06/tokyo-tech-meetup/just-use-postgres.condensed.pdf
2019-06-tokyo-tech-meetup-condensed-more-pdf:
@echo "DATA_DIR = $(DATA_DIR)"
@mkdir -p dist/2019/06/tokyo-tech-meetup
pandoc \
-t $(PDF_FORMAT)+footnotes \
--resource-path $(RESOURCE_PATH) \
--data-dir $(DATA_DIR) \
--self-contained \
-s 2019/06/tokyo-tech-meetup/just-use-postgres.condensed-more.md \
-o dist/2019/06/tokyo-tech-meetup/just-use-postgres.condensed-more.pdf
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment