Large tables with several columns
Last DB office hours we discussed the topic of having larger tables with several columns
# NOTE: The query is using the information of my local schema, results
# might be different on GitLab.com
gitlabhq_development=# SELECT table_name, count(*) as column_count
FROM information_schema.columns
WHERE table_schema = 'public'
GROUP by table_name order by column_count desc
LIMIT(10);
table_name | column_count
------------------------------------------+--------------
application_settings | 221
users | 84
projects | 82
geo_node_statuses | 54
ci_builds | 50
namespaces | 43
merge_requests | 40
conversational_development_index_metrics | 33
epics | 31
issues | 30
(10 rows)
At the moment users and projects are considered large tables due to its size (and blacklisted from specific operations).
application_settings has the largest number of columns, but since it only has one record it might no be a problem.
Disadvantages of adding more columns to larger tables:
- We can't easily add defaults for these columns (At least until we upgrade to PostgreSQL 11, as the rewritten behavior changes on this version)
- There's a performance impact when it comes to reading information from a wide table, we'll always need to read the whole record from disk and not part of it.
There are specific efforts to reduce the number of columns on projects like #19826 (closed) && !19761 (merged)
Proposal - To do
-
Identify large tables with several columns: !28161 (merged) - We could start with
projectsandusers, but perhaps also considerissues,merge_requests,ci_builds
- We could start with
-
Add a new cop to prevent adding columns to the wide tables - !28161 (merged) -
Add documentation to our Database Guides - !28681 (merged) -
Add a mention in the Engineering Week in Review
Edited by Mayra Cabrera