Explore Current state of Jupyter Notebook Diffs
[[_TOC_]]
## Tasks
| Description | Status |
| ------ | ------ |
| Create a sample repo to showcase the issues with Notebooks MRs | :white_check_mark: |
| Explore [nbdime](https://nbdime.readthedocs.io/en/latest/) as backend for notebook diff | :white_check_mark: |
| [Add another MR to the sample repo to see a completely new notebook being added.](https://gitlab.com/gitlab-org/incubation-engineering/mlops/meta/-/issues/13) | :white_check_mark: |
| [Explore how GitHub handles the Sample Repo](https://gitlab.com/gitlab-org/incubation-engineering/mlops/meta/-/issues/14) | :white_check_mark: |
| [Explore how ReviewNB handles the Sample Repo](https://gitlab.com/gitlab-org/incubation-engineering/mlops/meta/-/issues/15) | :white_check_mark: |
## Jupyter Notebooks Examples
### Gitlab
Due to the file structure, code reviewing Jupyter Notebooks can be challenging. https://gitlab.com/gitlab-org/incubation-engineering/mlops/ipynb-mr-sample gives some examples of that
#### Adding a new notebook
On https://gitlab.com/gitlab-org/incubation-engineering/mlops/ipynb-mr-sample/-/merge_requests/3, a notebook is added as a copy of the first one. Instead of showing what changed, or just a render of the new file, it shows a blob:

#### Rerunning a Notebook
On https://gitlab.com/gitlab-org/incubation-engineering/mlops/ipynb-mr-sample/-/merge_requests/5, nothing changes in the code. The only difference is the order in which lines were run. This caused the _entire_ file to change, and makes it impossible to code review

#### Changing a line
On https://gitlab.com/gitlab-org/incubation-engineering/mlops/ipynb-mr-sample/-/merge_requests/4, only a single line changed, which causes an image to change, but since the change is within a json structure, spotting what actually changed becomes really hard

### GitHub
With small UI changes, the MR's have the same issues on GitHub:
- https://github.com/eduardobonet2/ipynb-mr-sample/pull/1/files
- https://github.com/eduardobonet2/ipynb-mr-sample/pull/2/files
- https://github.com/eduardobonet2/ipynb-mr-sample/pull/3/files
### ReviewNB
[ReviewNB](https://www.reviewnb.com/) Is a service that allows the Code Review experience on top of GitHub repositories. Currently it only works with GitHub repositories. Unfortunately it is not possible to share the links to reviews itself, since it is necessary to login, but below we can see some screenshots for the scenarios depicted above.
#### Adding a new notebook
https://github.com/eduardobonet2/ipynb-mr-sample/pull/1
- :heavy_plus_sign: Displays a rendered version of the Notebook, even the color changes on the dataframe
- :heavy_plus_sign: Comments can be added per cell, and it is possible to specify a line in the cell.
- :heavy_minus_sign: It is not possible to add a comment to the output
<img src="/uploads/2d09cf262340c8cdcf26e66ed1768692/image.png" width="300" height="400">
#### Rerunning a notebook
https://github.com/eduardobonet2/ipynb-mr-sample/pull/2
- :heavy_plus_sign: Identifies when there's no change on a cell
- :heavy_plus_sign: Since I didn't set a seed, there was indeed a change on the notebook when generating the dataframe, and we can see that from ReviewNB
- :heavy_plus_sign:
- :heavy_minus_sign: The output images are the same, but since it reran the id of image changed, so it can compare them
- :heavy_minus_sign: No option to hide what didn't change

#### Changing a line
https://github.com/eduardobonet2/ipynb-mr-sample/pull/3
- :heavy_plus_sign: Highlights correctly what changed within the cell
- :heavy_plus_sign: There was a different execution order, and it is not shown. This can also be seen as something bad
- :heavy_minus_sign: Although it flags the output as changed, it doesn't really show

## Testing NBDime
On https://gitlab.com/eduardobonet/ipynb-mr-sample/-/tree/nbdime-test, we show a test of using nbdime to diff the contents between branch2 and another-branch. [nbdime_output.txt](https://gitlab.com/eduardobonet/ipynb-mr-sample/-/blob/nbdime-test/nbdime_output.txt) is the output of nbdiff (color characters should be removed later), but it could be parsed into a code mr review:
```
nbdiff notebook.ipynb old_notebook.ipynb
--- notebook.ipynb 2021-08-13 16:12:00.478714
+++ old_notebook.ipynb 2021-08-13 16:26:22.487417
## replaced /cells/2/execution_count:
- 5
+ 2
## modified /cells/2/outputs/0/data/text/plain:
- [<matplotlib.lines.Line2D at 0x130e389d0>]
+ [<matplotlib.lines.Line2D at 0x1307c1970>]
## replaced /cells/2/outputs/0/execution_count:
- 5
+ 2
## inserted before /cells/2/outputs/1:
+ output:
+ output_type: display_data
+ data:
+ image/png: iVBORw0K...<snip base64, md5=ecea473e60784aa2...>
+ text/plain: <Figure size 432x288 with 1 Axes>
+ metadata (unknown keys):
+ needs_background: light
## deleted /cells/2/outputs/1:
- output:
- output_type: display_data
- data:
- image/png: iVBORw0K...<snip base64, md5=13bfe1623a694f6d...>
- text/plain: <Figure size 432x288 with 1 Axes>
- metadata (unknown keys):
- needs_background: light
## modified /cells/2/source:
@@ -1,4 +1,4 @@
x = np.linspace(0, 4*np.pi,50)
-y = 2 * np.sin(x)
+y = np.sin(x)
plt.plot(x, y)
```
### Web Version
[The webapp version of the diff](https://gitlab.com/eduardobonet/ipynb-mr-sample/-/blob/nbdime-test/nbdime_web_output.png) (nbdiff-web) seems a lot closer to our goal

This seems to be a good candidate to power a first iteration.
issue