Skip to content

Resolve "Further optimise IpynbSymbolMap by using C extensions"

What does this MR do and why?

Creating Ipynb semantic diffs requires a source map from the json. We implemented this in ruby since there was no other option, which causes slow execution and heavy memory usage for large notebooks. We reached out to Oj maintainer (https://github.com/ohler55/oj/issues/780) and he got the needed changes implemented so that we can rely on the extension. This change does not modify anything visually, only improves the performance.

Screenshots or screen recordings

vendor/gems/ipynbdiff/spec/benchmark.rb can be used to compute memory usage and running time:

Benchmark Before:

Small Notebook: 27919
Large Notebook: 2.769.427
small_notebook  0.016363   0.001067   0.017430 (  0.019843)
large_notebook  0.838928   0.053604   0.892532 (  0.901792)
Calculating -------------------------------------
      small_notebook     1.284M memsize (     0.000  retained)
                        28.318k objects (     0.000  retained)
                        50.000  strings (     0.000  retained)
      large_notebook   125.647M memsize (     0.000  retained)
                         2.787M objects (     0.000  retained)
                        50.000  strings (     0.000  retained)

Benchmark After:

Small Notebook: 27.919 bytes
Large Notebook: 2.769.427 bytes
small_notebook  0.000949   0.000079   0.001028 (  0.001023)
large_notebook  0.059418   0.004342   0.063760 (  0.064079)
Calculating -------------------------------------
      small_notebook   192.404k memsize (    30.432k retained)
                       369.000  objects (    30.000  retained)
                        50.000  strings (    11.000  retained)
      large_notebook    17.601M memsize (     2.996M retained)
                        17.784k objects (     2.901k retained)
                        50.000  strings (    50.000  retained)

How to set up and validate locally

  1. Open a commit with a change to ipynb (eg gitlab-test@5d6ed150)
  2. Confirm there were no regressions, notebook should be generated as before

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #366189 (closed)

Edited by Eduardo Bonet

Merge request reports