Verified Commit 6438195c authored by MrMan's avatar MrMan

Condense the talk even more

parent 31f5b6e3
title: Just use Postgres
author: Victor Adossi
theme: Madrid
fonttheme: professionalfonts
date: June, 2019
# Roadmap #
- What is Postgres?
- Why/Why not Postgres?
- Key/Value Data
- Document Storage
- Geospatial Data
- Message Queue
- Time Series Data
# What is Postgres? #
**Postgres is the most advanced open source database that's ever existed**. It's developed in the open, driven and maintained by the community.
There are a few large contributors in the space like **2nd Quadrant** and **Citus Data** (acquired by MS in January).
Some of the features that set postgres apart:
- Multi Version Concurrency Control (MVCC)
- Plugin system (indices, functionality)
- Process-per-connection model
- Elephant mascot
Reddit got to 1 billion users on a master-slave Postgres (scaling up rather than out, mostly)[^1]
# Why/Why not Postgres? #
**Reliability** - Postgres is rock solid
**Performance** - "fast enough" to *pretty darn fast*
**Cloud vendor support** - AWS RDS, Azure Database, GCP Cloud SQL
**Open Source** - You can see how it works
Why *not* Postgres?
**No Vendor** - No vendor to call[^2]
**Scaling Out** - No official scale out story[^3]
**Learning Curve** - Structured Query Language (SQL) can be difficult
**Rigor** - Transactional guarantees can eat into performance
[^2]: Lots of consultancies though (like 2nd Quadrant) which can help out
[^3]: PostgresXL does exist
# Key/Value Data #
Postgres makes a surprisingly good simple key value store. You're not going to beat Redis, but it's *probably* going to be fast enough!
key text NOT NULL,
value jsonb,
created_at timestamptz NOT NULL DEFAULT NOW()
Pluggable storage engines (the table access interface)[^4] has landed, you could *actually* put Redis in your Postgres
# Document Storage #
id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
data jsonb,
updated_at timestamptz NOT NULL DEFAULT NOW(),
created_at timestamptz NOT NULL DEFAULT NOW()
-- GIN indexes massively speed up searches like:
-- SELECT * FROM docs WHERE data @> {"some_key": "some_value"}
CREATE INDEX docs_data_idx ON docs USING GIN (data);
Look into Postgres's full range of JSON operators[^5]. SQL/JSON (JSONPath for SQL) is coming in 12[^6].
# Geospatial Data #
Geographic Information System (GIS) data is the bread and butter of PostGIS[^8]:
FROM city, superhero
WHERE ST_Contains(city.geom, superhero.geom)
AND = 'Gotham';
Feature set and documentation for PostGIS is *extensive*.
# Message Queues #
If all your application instances are connected to the database, why not have them communicate?
-- Create a channel named "virtual"
LISTEN virtual;
-- Notify with no payload
NOTIFY virtual;
-- notify with payload
NOTIFY virtual, 'This is the payload';
Maybe you don't need a NATS/RabbitMQ/NSQ/Kafka cluster *just* yet.
Want to go deeper? Try combining this feature with some `UNLOGGED` and `TEMPORARY` tables and build some data pipelines.
# Time Series Data #
You could build your own solution by using `PARTITION`s, `UNLOGGED` tables, some `TRIGGER`s, but don't bother. Just use TimescaleDB[^9].
![TimescaleDB insert performance on 1B inserts](timescale-vs-postgres-insert-1B.jpg){ height=50% }
# Time Series Data (continued) #
TimescaleDB compares favorably to MongoDB[^10] and InfluxDB[^11].
![](timescale-vs-influx.png){ height=60% }
# So What? #
Postgres may not be the best solution to your problem, but it's very often **good enough**.
Before introducing a new piece to your infrastructure, consider using your Postgres database to solve the problem.
# The End #
Thanks for listening
# whoami
If you've got any corrections, complaints, or comments, feel free to reach me using the information below:
Victor Adossi ([email protected], [email protected])
GPG: ED874DE957CFB552
I run a couple very small consultancies to support businesses in Japan and the USA:
Need help figuring out *how* you're going to use Postgres in your infrastructure? I can help with that.
# Bloopers: Hot takes and tips #
A bunch of things I think that are probably right:
- Use Gitlab
- Don't write ECMAscript (AKA Javascript) without Typescript
- Try Lisp & Haskell (separately?) at least once
- Try Rust more than once
- Never price by project*
- Don't build & deploy VMs on a greenfield project in 2019**
\* Unless you've built the thing already and you are literally going to reskin it and the client has absolutely *no* new feature requests.
\** Unless your VM in production is basically Container Linux
......@@ -10,11 +10,9 @@ date: June, 2019
- What is Postgres?
- Why/Why not Postgres?
- Relational Data
- Key/Value Data
- Document Storage
- Geospatial Data
- Log Storage
- Message Queue
- Time Series Data
......@@ -65,23 +63,6 @@ Why *not* Postgres?
[^3]: PostgresXL does exist
# Relational Data #
99% of the data your organization needs to deal with is going to be relational -- most data isn't very useful without context.
SELECT company_name,amount,payment_status
FROM customers
JOIN invoices ON = invoices.customer_id
WHERE payment_status=='not-paid';
Postgres has `ENUM`s and custom `TYPE`s, and advanced `CONSTRAINT`s and `TRIGGER`s, tools you can use to make sure your data is *valid*, and *correct* data.
# Key/Value Data #
Postgres makes a surprisingly good simple key value store. You're not going to beat Redis, but it's *probably* going to be fast enough!
......@@ -145,30 +126,6 @@ Feature set and documentation for PostGIS is *extensive*.
# Log Storage #
Declarative partitioning means Postgres can take your gobs of structured logs (they *are* structured right?)
-- The partitioned table
data jsonb NOT NULL,
logged_at timestamptz NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (logged_at);
-- A partition for the month of June
SET TIME ZONE 'Asia/Tokyo';
CREATE TABLE logs_2019_06
FOR VALUES FROM ('2019-06') TO ('2019-07');
Some assembly/maintenance *is* required, but faster queries on smaller data sets (constraint exclusion) has never been cheaper.
# Message Queues #
If all your application instances are connected to the database, why not have them communicate?
.PHONY: all \
2019-04-mercari-dev-meetup 2019-04-mercari-dev-meetup-pdf 2019-04-mercari-dev-meetup-condensed-pdf \
2019-06-tokyo-tech-meetup 2019-06-tokyo-tech-meetup-pdf 2019-06-tokyo-tech-meetup-condensed-pdf 2019-06-tokyo-tech-meetup-watch
2019-06-tokyo-tech-meetup 2019-06-tokyo-tech-meetup-pdf 2019-06-tokyo-tech-meetup-condensed-pdf 2019-06-tokyo-tech-meetup-condensed-more-pdf 2019-06-tokyo-tech-meetup-watch
all: 2019-04-mercari-dev-meetup 2019-06-tokyo-tech-meetup
......@@ -33,7 +33,7 @@ ENTR ?= entr
-s 2019/04/ \
-o dist/2019/04/mercari-backend-meetup-condensed.pdf
2019-06-tokyo-tech-meetup: 2019-06-tokyo-tech-meetup-pdf 2019-06-tokyo-tech-meetup-condensed-pdf
2019-06-tokyo-tech-meetup: 2019-06-tokyo-tech-meetup-pdf 2019-06-tokyo-tech-meetup-condensed-pdf 2019-06-tokyo-tech-meetup-condensed-more-pdf
find 2019/06/* | $(ENTR) -rc make 2019-06-tokyo-tech-meetup
......@@ -59,3 +59,14 @@ ENTR ?= entr
--self-contained \
-s 2019/06/tokyo-tech-meetup/ \
-o dist/2019/06/tokyo-tech-meetup/just-use-postgres.condensed.pdf
@echo "DATA_DIR = $(DATA_DIR)"
@mkdir -p dist/2019/06/tokyo-tech-meetup
pandoc \
-t $(PDF_FORMAT)+footnotes \
--resource-path $(RESOURCE_PATH) \
--data-dir $(DATA_DIR) \
--self-contained \
-s 2019/06/tokyo-tech-meetup/ \
-o dist/2019/06/tokyo-tech-meetup/just-use-postgres.condensed-more.pdf
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment