Verified Commit 8ddfe84f authored by MrMan's avatar MrMan

Add condensed version

parent f50622ff
title: Just use Postgres
author: Victor Adossi
theme: Madrid
fonttheme: professionalfonts
date: June, 2019
# Roadmap #
- What is Postgres?
- Why/Why not Postgres?
- Relational Data
- Key/Value Data
- Document Storage
- Geospatial Data
- Log Storage
- Message Queue
- Time Series Data
# What is Postgres? #
**Postgres is the most advanced open source database that's ever existed**. It's developed in the open, driven and maintained by the community.
There are a few large contributors in the space like **2nd Quadrant** and **Citus Data** (acquired by MS in January).
A few features set postgres apart:
- Multi Version Concurrency Control (MVCC)
- Plugin system (indices, functionality)
- Process-per-connection model
- Elephant mascot
Reddit got to 1 billion users on a master-slave Postgres (scaling up rather than out, mostly)[^1]
# Why/Why not Postgres? #
**Reliability** - Postgres is rock solid
**Performance** - Postgres is generally "fast enough", and can even be *really fast*
**Cloud vendor support** - AWS RDS, Azure Database, GCP Cloud SQL
**Open Source** - You can see how it works
Why *not* Postgres?
**No Vendor** - No vendor to call[^2]
**Scaling Out** - No official scale out story[^3]
**Learning Curve** - Structured Query Language (SQL) can be difficult
**Rigor** - Transactional guarantees can eat into performance
[^2]: Lots of consultancies though (like 2nd Quadrant) which can help out
[^3]: PostgresXL does exist
# NOSQL vs SQL #
What people *normally* mean when they say "NOSQL" is a rejection of the structure, and transactional guarantees normally included with a Relational DataBase Management Systems (RDBMS)
**Relational Structure** as in Relational Algebra (sets, projection, unions, intersections, joins)
**Transactional Guarantees** as in ACID (Atomicity, Consistency, Isolation, Durability)
Examples of NOSQL databases:
- RethinkDB / MongoDB (documents, usually JSON)
- Redis (key/value)
- Neo4J (graphs)
- Postgres (more on this later)
# Relational Data #
99% of the data your organization needs to deal with is going to be relational -- most data isn't very useful without context.
SELECT company_name,amount,payment_status
FROM customers
JOIN invoices ON = invoices.customer_id
WHERE payment_status=='not-paid';
Postgres has `ENUM`s and custom `TYPE`s, and advanced `CONSTRAINT`s and `TRIGGER`s, tools you can use to make sure your data is *valid*, and *correct* data.
# Key/Value Data #
Postgres makes a surprisingly good simple key value store. You're not going to beat Redis, but it's *probably* going to be fast enough!
key text NOT NULL,
value jsonb,
created_at timestamptz NOT NULL DEFAULT NOW()
Pluggable storage engines (the table access interface)[^4] has landed, you could *actually* put Redis in your Postgres
# Document Storage #
id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
data jsonb,
updated_at timestamptz NOT NULL DEFAULT NOW(),
created_at timestamptz NOT NULL DEFAULT NOW()
-- GIN indexes massively speed up searches like:
-- SELECT * FROM docs WHERE data @> {"some_key": "some_value"}
CREATE INDEX docs_data_idx ON docs USING GIN (data);
Look into Postgres's full range of JSON operators[^5]. SQL/JSON (JSONPath for SQL) is coming in 12[^6].
# Geospatial Data #
Geographic Information System (GIS) data is the bread and butter of PostGIS[^8]:
FROM city, superhero
WHERE ST_Contains(city.geom, superhero.geom)
AND = 'Gotham';
Feature set and documentation for PostGIS is *extensive*.
# Log Storage #
Declarative partitioning means Postgres can take your gobs of structured logs (they *are* structured right?)
-- The partitioned table
data jsonb NOT NULL,
logged_at timestamptz NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (logged_at);
-- A partition for the month of June
SET TIME ZONE 'Asia/Tokyo';
CREATE TABLE logs_2019_06
FOR VALUES FROM ('2019-06') TO ('2019-07');
Some assembly/maintenance *is* required, but faster queries on smaller data sets (constraint exclusion) has never been cheaper.
# Message Queues #
If all your application instances are connected to the database, why not have them communicate?
-- Create a channel named "virtual"
LISTEN virtual;
-- Notify with no payload
NOTIFY virtual;
-- notify with payload
NOTIFY virtual, 'This is the payload';
Maybe you don't need a NATS/RabbitMQ/NSQ/Kafka cluster *just* yet.
Want to go deeper? Try combining this feature with some `UNLOGGED` and `TEMPORARY` tables and build some data pipelines.
# Time Series Data #
You could build your own solution by using `PARTITION`s, `UNLOGGED` tables, some `TRIGGER`s, but don't bother. Just use TimescaleDB[^9].
![TimescaleDB insert performance on 1B inserts](timescale-vs-postgres-insert-1B.jpg){ height=50% }
# Time Series Data (continued) #
TimescaleDB also has an excellent, reasoned technical dives on where and why they can beat databases like MongoDB[^10] and even purpose-built DBs like InfluxDB[^11].
![TimescaleDB vs influx](timescale-vs-influx.png){ height=60% }
# The End #
Thanks for listening
# whoami
If you've got any corrections, complaints, or comments, feel free to reach me using the information below:
Victor Adossi ([email protected], [email protected])
GPG: ED874DE957CFB552
I run a couple very small consultancies to support businesses in Japan and the USA:
Need help figuring out *how* you're going to use Postgres in your infrastructure? I can help with that.
# Bloopers: Hot takes and tips #
A bunch of things I think that are probably right:
- Use Gitlab
- Don't write ECMAscript (AKA Javascript) without Typescript
- Try Lisp & Haskell (separately?) at least once
- Try Rust more than once
- Never price by project*
- Don't build & deploy VMs on a greenfield project in 2019**
\* Unless you've built the thing already and you are literally going to reskin it and the client has absolutely *no* new feature requests.
\** Unless your VM in production is basically Container Linux
......@@ -148,7 +148,7 @@ CREATE TABLE docs (
data jsonb,
updated_at timestamptz NOT NULL DEFAULT NOW(),
created_at timestamptz NOT NULL DEFAULT NOW()
-- GIN indexes massively speed up searches like:
-- SELECT * FROM docs WHERE data @> {"some_key": "some_value"}
.PHONY: all \
2019-04-mercari-dev-meetup 2019-04-mercari-dev-meetup-pdf 2019-04-mercari-dev-meetup-condensed-pdf
2019-04-mercari-dev-meetup 2019-04-mercari-dev-meetup-pdf 2019-04-mercari-dev-meetup-condensed-pdf \
2019-06-tokyo-tech-meetup 2019-06-tokyo-tech-meetup-pdf 2019-06-tokyo-tech-meetup-condensed-pdf 2019-06-tokyo-tech-meetup-watch
all: 2019-04-mercari-dev-meetup 2019-06-tokyo-tech-meetup
......@@ -32,7 +33,7 @@ ENTR ?= entr
-s 2019/04/ \
-o dist/2019/04/mercari-backend-meetup-condensed.pdf
2019-06-tokyo-tech-meetup: 2019-06-tokyo-tech-meetup-pdf
2019-06-tokyo-tech-meetup: 2019-06-tokyo-tech-meetup-pdf 2019-06-tokyo-tech-meetup-condensed-pdf
find 2019/06/* | $(ENTR) -rc make 2019-06-tokyo-tech-meetup
......@@ -47,3 +48,14 @@ ENTR ?= entr
--self-contained \
-s 2019/06/tokyo-tech-meetup/ \
-o dist/2019/06/tokyo-tech-meetup/just-use-postgres.pdf
@echo "DATA_DIR = $(DATA_DIR)"
@mkdir -p dist/2019/06/tokyo-tech-meetup
pandoc \
-t $(PDF_FORMAT)+footnotes \
--resource-path $(RESOURCE_PATH) \
--data-dir $(DATA_DIR) \
--self-contained \
-s 2019/06/tokyo-tech-meetup/ \
-o dist/2019/06/tokyo-tech-meetup/just-use-postgres.condensed.pdf
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment