Speed up bulk-rendering of Banzai fields that are not cached in the database
Follow-up from https://gitlab.com/gitlab-org/gitlab-ce/issues/43140
tl;dr: rendering commits is slow and still takes too many SQL queries, even after the work that went into https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/21500
Banzai, our markdown processor, has an interface to render a collection of objects - Banzai::ObjectRenderer
. This attempts to reduce the number of SQL queries required to render the collection by aggregating reference lookups across all the documents.
Rendering a document has three steps - running the pipeline to convert markdown to HTML(render); postprocessing; and redaction.
At present, only references in the latter two steps are aggregated. This is because, typically, we only run the first stage once - the content is cached in the database. So while the initial render of a collection of notes is slow, subsequent renders are very fast.
However, we've since extended the ObjectRenderer
to handle collections of commits. These do not have persistent storage associated with them, so the unoptimized render step must run every time the commits are viewed. This is slow and performs needless N+1 SQL queries for users, projects, groups, labels, milestones, epics... anything that can be referenced in GFM.
Two major options present themselves:
- Improve Banzai rendering to aggregate references in the first step
- Add some form of persistent storage to commits
Two options for the second:
- Redis caching
- A database table like:
create_table :markdown_column_caches do |t|
t.references :project
t.string :key
t.integer :version # for cache invalidation
t.string :html # the unredacted output
end
Either would be fine from my point of view, but perhaps the database team strongly disapproves of the latter proposal.