Skip to content

Add ConvertTextToDoc filter

Brett Walker requested to merge bw-convert-to-doc-filter into master

What does this MR do and why?

After HTML is generated by the markdown parser, it gets converted into a Nokogiri document in order for us to do further transformations. When debugging the pipeline, we can see that the conversion happens in DollarMathPostLegacyFilter because even though that filter doesn't do anything, it returns doc which starts the conversion. So when viewing the timing of filters, that conversion time appears for that filter, which makes it look like that legacy filter is performing some function, when it's not.

This new filter just allows us to properly recognize where/what is happening when looking for performance wins. It doesn't effect the operation or performance of the pipeline.

Previous example

D, [2024-06-20T18:07:47.132537 #77079] DEBUG -- : 0.000037_s (0.000037_s): NormalizeSourceFilter [PreProcessPipeline]
D, [2024-06-20T18:07:47.562742 #77079] DEBUG -- : 0.430072_s (0.430109_s): TruncateSourceFilter [PreProcessPipeline]
D, [2024-06-20T18:07:47.563855 #77079] DEBUG -- : 0.001028_s (0.431137_s): FrontMatterFilter [PreProcessPipeline]
D, [2024-06-20T18:07:47.636249 #77079] DEBUG -- : 0.000013_s (0.000013_s): MarkdownPreEscapeLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:47.636304 #77079] DEBUG -- : 0.000004_s (0.000017_s): DollarMathPreLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:47.637380 #77079] DEBUG -- : 0.000996_s (0.001013_s): BlockquoteFenceLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:47.885543 #77079] DEBUG -- : 0.248029_s (0.249042_s): MarkdownFilter [FullPipeline]
D, [2024-06-20T18:07:50.673923 #77079] DEBUG -- : 1.788126_s (2.037188_s): DollarMathPostLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:50.673950 #77079] DEBUG -- : 0.000005_s (2.037193_s): MarkdownPostEscapeLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:50.719421 #77079] DEBUG -- : 0.045390_s (2.082583_s): CodeLanguageFilter [FullPipeline]
...

With this change

D, [2024-06-20T18:07:47.132537 #77079] DEBUG -- : 0.000037_s (0.000037_s): NormalizeSourceFilter [PreProcessPipeline]
D, [2024-06-20T18:07:47.562742 #77079] DEBUG -- : 0.430072_s (0.430109_s): TruncateSourceFilter [PreProcessPipeline]
D, [2024-06-20T18:07:47.563855 #77079] DEBUG -- : 0.001028_s (0.431137_s): FrontMatterFilter [PreProcessPipeline]
D, [2024-06-20T18:07:47.636249 #77079] DEBUG -- : 0.000013_s (0.000013_s): MarkdownPreEscapeLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:47.636304 #77079] DEBUG -- : 0.000004_s (0.000017_s): DollarMathPreLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:47.637380 #77079] DEBUG -- : 0.000996_s (0.001013_s): BlockquoteFenceLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:47.885543 #77079] DEBUG -- : 0.248029_s (0.249042_s): MarkdownFilter [FullPipeline]
D, [2024-06-20T18:07:50.673831 #77079] DEBUG -- : 1.788126_s (2.037168_s): ConvertTextToDocFilter [FullPipeline]
D, [2024-06-20T18:07:50.673923 #77079] DEBUG -- : 0.000020_s (2.037188_s): DollarMathPostLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:50.673950 #77079] DEBUG -- : 0.000005_s (2.037193_s): MarkdownPostEscapeLegacyFilter [FullPipeline]
D, [2024-06-20T18:07:50.719421 #77079] DEBUG -- : 0.045390_s (2.082583_s): CodeLanguageFilter [FullPipeline]
...

Note that you can see these timings by running in the rails console

text = '**some markdown text**'
Banzai.render(text, project: Project.first, debug_timing: true)

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Brett Walker

Merge request reports