Speed up bulk-rendering of Banzai fields that are not cached in the database (#24052) · Issues · GitLab.org / GitLab · GitLab

Speed up bulk-rendering of Banzai fields that are not cached in the database

Follow-up from https://gitlab.com/gitlab-org/gitlab-ce/issues/43140 tl;dr: rendering commits is slow and still takes too many SQL queries, even after the work that went into https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/21500 Banzai, our markdown processor, has an interface to render a collection of objects - `Banzai::ObjectRenderer`. This attempts to reduce the number of SQL queries required to render the collection by aggregating reference lookups across all the documents. Rendering a document has three steps - running the pipeline to convert markdown to HTML(render); postprocessing; and redaction. At present, only references in the latter two steps are aggregated. This is because, typically, we only run the first stage once - the content is cached in the database. So while the initial render of a collection of notes is slow, subsequent renders are very fast. However, we've since extended the `ObjectRenderer` to handle collections of commits. These *do not* have persistent storage associated with them, so the unoptimized render step must run every time the commits are viewed. This is slow and performs needless N+1 SQL queries for users, projects, groups, labels, milestones, epics... anything that can be referenced in GFM. Two major options present themselves: * Improve Banzai rendering to aggregate references in the first step * Add some form of persistent storage to commits Two options for the second: * Redis caching * A database table like: ```ruby create_table :markdown_column_caches do |t| t.references :project t.string :key t.integer :version # for cache invalidation t.string :html # the unredacted output end ``` Either would be fine from my point of view, but perhaps the database team strongly disapproves of the latter proposal. /cc @DouweM @yorickpeterse @jramsay

issue