Consider forbidding use of int4 both in PKs and in referencing columns

The int4 PK problem is not trivial to solve (see https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/5828).

Primary keys

Issues related to forbidding int4 in PKs:

Foreign keys / referencing columns

As for referencing tables, when we define a column with FK, it is now common to "inherit" the data type from the PK column. We can save time and efforts if we start using int8 (a.k.a. bigint) in such new columns right now, even if corresponding PKs are still int4, this will some time and efforts in future, when the conversion will be needed.

During operations, if PK is int4, and referencing column is int8, Postgres will make data type conversion implicitly. Performance difference will not be noticeable, let's see:

test=# create table t1 as select id::int4 from generate_series(0, 1000000) _(id);
SELECT 1000001
Time: 779.769 ms
test=# alter table t1 add primary key(id);
ALTER TABLE
Time: 449.830 ms

test=# create table t1 as select id::int4 from generate_series(0, 1000000) _(id);
SELECT 1000001
Time: 760.321 ms
test=# alter table t1 add primary key(id);
ALTER TABLE
Time: 504.420 ms
test=# create table t2_int4 as select (random() * 100000)::int4 as id from generate_series(1, 1000000) _;
SELECT 1000000
Time: 973.461 ms
test=# alter table t2_int4 add constraint fk_int4 foreign key(id) references t1(id);
ALTER TABLE
Time: 989.725 ms
test=# create index i_t2_int4 on t2_int4 (id);
CREATE INDEX
Time: 1376.435 ms

test=# create table t2_int8 as select (random() * 100000)::int8 as id from generate_series(1, 1000000) _;
SELECT 1000000
Time: 965.563 ms
test=# alter table t2_int8 add constraint fk_int8 foreign key(id) references t1(id);
ALTER TABLE
Time: 961.099 ms
test=# create index i_t2_int8 on t2_int8(id);
CREATE INDEX
Time: 1061.538 ms

Notice that timings of FKs creation in both cases are almost the same (of course it's worth to repeat the experiment multiple times to get statistically meaningful results, but I'm sure that results will remain).

Now let's compare the performance of simple JOIN using pgbench, joining the referenced table t1 (which has int4 PKs) with either t2_int4 or t2_int8:

$ echo '\set i random(1, 100000)' >> /tmp/select_int4.sql
$ echo 'select * from t1 join t2_int4 using (id) where t1.id = :i;' >> /tmp/select_int4.sql

$ echo '\set i random(1, 100000)' >> /tmp/select_int8.sql
$ echo 'select * from t1 join t2_int4 using (id) where t1.id = :i;' >> /tmp/select_int4.sql

$ /opt/gitlab/embedded/bin/pgbench -h localhost test -j4 -c4 -T120 -P30 -rn -f /tmp/select_int4.sql
progress: 30.0 s, 16680.7 tps, lat 0.239 ms stddev 0.093
progress: 60.0 s, 16986.6 tps, lat 0.235 ms stddev 0.193
progress: 90.0 s, 16514.8 tps, lat 0.241 ms stddev 0.082
progress: 120.0 s, 16014.4 tps, lat 0.249 ms stddev 0.119
transaction type: /tmp/select_int4.sql
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 4
duration: 120 s
number of transactions actually processed: 1985899
latency average = 0.241 ms
latency stddev = 0.130 ms
tps = 16549.053385 (including connections establishing)
tps = 16551.615972 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.001  \set i random(1, 100000)
         0.240  select * from t1 join t2_int4 using (id) where t1.id = :i;


$ /opt/gitlab/embedded/bin/pgbench -h localhost test -j4 -c4 -T120 -P30 -rn -f /tmp/select_int8.sql
progress: 30.0 s, 13935.3 tps, lat 0.286 ms stddev 0.080
progress: 60.0 s, 14351.6 tps, lat 0.278 ms stddev 0.075
progress: 90.0 s, 14245.2 tps, lat 0.280 ms stddev 0.069
progress: 120.0 s, 14353.6 tps, lat 0.278 ms stddev 0.073
transaction type: /tmp/select_int8.sql
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 4
duration: 120 s
number of transactions actually processed: 1706577
latency average = 0.280 ms
latency stddev = 0.074 ms
tps = 14221.373791 (including connections establishing)
tps = 14223.417119 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.001  \set i random(1, 100000)
         0.280  select * from t1 join t2_int8 using (id) where t1.id = :i;

Conclusions from the "Simple JOIN" experiment:

Postgres does do implicit conversion when it needs to compare int4 and int8, so we don't need to care about it;
there is a ~15% performance penalty for the "int8->int4 FK" case, due to implicit int4/int8 conversion. In absolute values, conversion takes ~40 μs (0.04 ms), it is 15% just because SELECT itself is very fast here, with sub-millisecond performance.

Finally, let's check INSERT performance, keeping in mind that FK is defined (so Postgres needs to check that t1 has the value we're using when inserting to t2_int4 / t2_int8):

$ echo '\set i random(1, 100000)' >> /tmp/insert_int4.sql
$ echo 'insert into t2_int4 select :i;' >> /tmp/insert_int4.sql
$ echo '\set i random(1, 100000)' >> /tmp/insert_int8.sql
$ echo 'insert into t2_int8 select :i;' >> /tmp/insert_int8.sql

$ /opt/gitlab/embedded/bin/pgbench -h localhost test -j4 -c4 -T120 -P30 -rn -f /tmp/insert_int4.sql
progress: 30.0 s, 3116.6 tps, lat 1.281 ms stddev 0.300
progress: 60.0 s, 3228.3 tps, lat 1.238 ms stddev 0.354
progress: 90.0 s, 3245.0 tps, lat 1.231 ms stddev 0.242
progress: 120.0 s, 3312.5 tps, lat 1.206 ms stddev 0.241
transaction type: /tmp/insert_int4.sql
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 4
duration: 120 s
number of transactions actually processed: 387075
latency average = 1.239 ms
latency stddev = 0.289 ms
tps = 3225.579869 (including connections establishing)
tps = 3226.041952 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.001  \set i random(1, 100000)
         1.238  insert into t2_int4 select :i;

$ /opt/gitlab/embedded/bin/pgbench -h localhost test -j4 -c4 -T120 -P30 -rn -f /tmp/insert_int8.sql
progress: 30.0 s, 3240.8 tps, lat 1.232 ms stddev 0.248
progress: 60.0 s, 2959.5 tps, lat 1.350 ms stddev 0.426
progress: 90.0 s, 3248.9 tps, lat 1.230 ms stddev 0.255
progress: 120.0 s, 3313.8 tps, lat 1.206 ms stddev 0.239
transaction type: /tmp/insert_int8.sql
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 4
duration: 120 s
number of transactions actually processed: 382893
latency average = 1.252 ms
latency stddev = 0.303 ms
tps = 3190.744502 (including connections establishing)
tps = 3191.211677 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.001  \set i random(1, 100000)
         1.251  insert into t2_int8 select :i;

Conclusions from the "Simple INSERTs" experiment:

again, Postgres uses implicit conversion when checking FKs;
performance penalty for the "int8->int4 FK" case is just ~1% here. In absolute values, it was just 13 μs (0.013 ms). The query timing was ~1ms, so the performance degradation cause by implicit conversion is almost not noticeable.

General conclusion

We can use int8 in FKs with small performance penalty, reducing the amount of work on conversion in the future.

Edited Apr 30, 2019 by Nikolay Samokhvalov