Include tensorflow.js into extension

Background

We would like to use machine learning algorithms to detect ads. To run our trained models in a browser we need to use a third party library for accelerated linear algebra operations. In adblockpluscore#25 we have chosen to use tensorflow.js.

The idea is to load a machine learning model in a background page during extension start up and then query that model from a content script (snippet). The machine learning model is ~250kb of binary that will be living in memory for a lifetime of an extension. This is on top of TensorFlow.js overhead (~1Mb). This way all machine learning operations (pre-processing of data, machine learning inference and result parsing) will be performed in extension's background page and a snippet will only collect data.

So far the snippet we are working on is only targeting Facebook, but other use cases are possible once we implement the infrastructure.

What to change

  • Embed tensorflow.js into extension bundle, and make it available in a background page
  • Make sure we know lock the version of tensorflow.js in use, by including package-lock.json in repository
  • Embed a trained model needed for adblockpluscore#25 into extension bundle, add code to load that model and run inference on it
  • Implement data pre-processing required to convert data from the format extracted by a snippet into format required by a model

Dependency update

  • To properly include tensorflow.js we need to update buildtools to hg:016f064a11c3/git:1d1f458. This update includes these changes:
Commit git hash hg hash
Noissue - Add node_modules directories to the Webpack resolve paths 1d1f458 016f064a11c3

We also need to update adblockpluscore to hg:48643872b0b7/git:5fa8dc1:

Commit git hash hg hash
Noissue - Add lint rule arrow-parens to enforce our style 05ac9696 5232675a94b1
Noissue - Update public suffix list 01a3f5af 1a8ef89c04ad
Fixed #25, #98 - Add TensorFlow.js support in a snippet ca9586a2 0867572b5f53
Noissue - Merged next into master 5fa8dc18 48643872b0b7

Hints for testers

This issue should be tested in conjunction with adblockpluscore#25.

We are now including a lot of third-party code into the background page. It is self contained, but we need to make sure the background page still functions as expected in all other cases (perhaps make sure there's nothing unexpected in the console of background page?).

TensorFlow.js environments

As it is stated in this article, there are multiple different configurations as far as TensorFlow.js is concerned. We would want to test that we run fine in all of these environments:

  • No WebGL support in a browser
  • Available WebGL support, but no dedicated GPU available - various devices (per browser)
  • Available WebGL support, with dedicated GPU - various devices (per browser)
  • Mobile devices, as they are likely to have WebGL with only 16-bit floating point support.

Snippet testing hints

We are including a pre-trained model with this issue. We need to make sure the model is performing as expected. Best way to do that would be to use the snippet from adblockpluscore#25 and go to https://eyeo.gitlab.io/machine-learning/fb-test/index_ads.html and https://eyeo.gitlab.io/machine-learning/fb-test/index.html and investigate the number of miss-classifications. Another way to test would be to go directly on Facebook and see which posts are being hidden in a real world.

Model predictions can be different on mobile devices, as they are likely to have WebGL implementation with only 16-bit floating point support. Since snippets can run on Firefox on mobile we need to make sure predictions are as expected there separately.

Edited by Ollie