Add ConvertTextToDoc filter
What does this MR do and why?
After HTML is generated by the markdown parser, it gets converted into a Nokogiri document in
order for us to do further transformations. When debugging the pipeline, we can see that the
conversion happens in DollarMathPostLegacyFilter
because even though that filter doesn't do
anything, it returns doc
which starts the conversion. So when viewing the timing of filters,
that conversion time appears for that filter, which makes it look like that legacy filter is
performing some function, when it's not.
This new filter just allows us to properly recognize where/what is happening when looking for performance wins. It doesn't effect the operation or performance of the pipeline.
Previous example
D, [2024-06-20T18:07:47.132537 #77079] DEBUG -- : 0.000037_s (0.000037_s): NormalizeSourceFilter [PreProcessPipeline]
D, [2024-06-20T18:07:47.562742 #77079] DEBUG -- : 0.430072_s (0.430109_s): TruncateSourceFilter [PreProcessPipeline]
D, [2024-06-20T18:07:47.563855 #77079] DEBUG -- : 0.001028_s (0.431137_s): FrontMatterFilter [PreProcessPipeline]
D, [2024-06-20T18:07:47.636249 #77079] DEBUG -- : 0.000013_s (0.000013_s): MarkdownPreEscapeLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:47.636304 #77079] DEBUG -- : 0.000004_s (0.000017_s): DollarMathPreLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:47.637380 #77079] DEBUG -- : 0.000996_s (0.001013_s): BlockquoteFenceLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:47.885543 #77079] DEBUG -- : 0.248029_s (0.249042_s): MarkdownFilter [FullPipeline]
D, [2024-06-20T18:07:50.673923 #77079] DEBUG -- : 1.788126_s (2.037188_s): DollarMathPostLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:50.673950 #77079] DEBUG -- : 0.000005_s (2.037193_s): MarkdownPostEscapeLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:50.719421 #77079] DEBUG -- : 0.045390_s (2.082583_s): CodeLanguageFilter [FullPipeline]
...
With this change
D, [2024-06-20T18:07:47.132537 #77079] DEBUG -- : 0.000037_s (0.000037_s): NormalizeSourceFilter [PreProcessPipeline]
D, [2024-06-20T18:07:47.562742 #77079] DEBUG -- : 0.430072_s (0.430109_s): TruncateSourceFilter [PreProcessPipeline]
D, [2024-06-20T18:07:47.563855 #77079] DEBUG -- : 0.001028_s (0.431137_s): FrontMatterFilter [PreProcessPipeline]
D, [2024-06-20T18:07:47.636249 #77079] DEBUG -- : 0.000013_s (0.000013_s): MarkdownPreEscapeLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:47.636304 #77079] DEBUG -- : 0.000004_s (0.000017_s): DollarMathPreLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:47.637380 #77079] DEBUG -- : 0.000996_s (0.001013_s): BlockquoteFenceLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:47.885543 #77079] DEBUG -- : 0.248029_s (0.249042_s): MarkdownFilter [FullPipeline]
D, [2024-06-20T18:07:50.673831 #77079] DEBUG -- : 1.788126_s (2.037168_s): ConvertTextToDocFilter [FullPipeline]
D, [2024-06-20T18:07:50.673923 #77079] DEBUG -- : 0.000020_s (2.037188_s): DollarMathPostLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:50.673950 #77079] DEBUG -- : 0.000005_s (2.037193_s): MarkdownPostEscapeLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:50.719421 #77079] DEBUG -- : 0.045390_s (2.082583_s): CodeLanguageFilter [FullPipeline]
...
Note that you can see these timings by running in the rails console
text = '**some markdown text**'
Banzai.render(text, project: Project.first, debug_timing: true)
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.