Include tensorflow.js into extension
Background
We would like to use machine learning algorithms to detect ads. To run our trained models in a browser we need to use a third party library for accelerated linear algebra operations. In adblockpluscore#25 we have chosen to use tensorflow.js.
The idea is to load a machine learning model in a background page during extension start up and then query that model from a content script (snippet). The machine learning model is ~250kb of binary that will be living in memory for a lifetime of an extension. This is on top of TensorFlow.js
overhead (~1Mb). This way all machine learning operations (pre-processing of data, machine learning inference and result parsing) will be performed in extension's background page and a snippet will only collect data.
So far the snippet we are working on is only targeting Facebook, but other use cases are possible once we implement the infrastructure.
What to change
- Embed
tensorflow.js
into extension bundle, and make it available in a background page - Make sure we know lock the version of
tensorflow.js
in use, by includingpackage-lock.json
in repository - Embed a trained model needed for adblockpluscore#25 into extension bundle, add code to load that model and run inference on it
- Implement data pre-processing required to convert data from the format extracted by a snippet into format required by a model
Dependency update
- To properly include
tensorflow.js
we need to updatebuildtools
tohg:016f064a11c3
/git:1d1f458
. This update includes these changes:
Commit | git hash | hg hash |
---|---|---|
Noissue - Add node_modules directories to the Webpack resolve paths | 1d1f458 | 016f064a11c3 |
We also need to update adblockpluscore
to hg:48643872b0b7
/git:5fa8dc1
:
Commit | git hash | hg hash |
---|---|---|
Noissue - Add lint rule arrow-parens to enforce our style | 05ac9696 | 5232675a94b1 |
Noissue - Update public suffix list | 01a3f5af | 1a8ef89c04ad |
Fixed #25, #98 - Add TensorFlow.js support in a snippet | ca9586a2 | 0867572b5f53 |
Noissue - Merged next into master | 5fa8dc18 | 48643872b0b7 |
Hints for testers
This issue should be tested in conjunction with adblockpluscore#25.
We are now including a lot of third-party code into the background page. It is self contained, but we need to make sure the background page still functions as expected in all other cases (perhaps make sure there's nothing unexpected in the console of background page?).
TensorFlow.js environments
As it is stated in this article, there are multiple different configurations as far as TensorFlow.js is concerned. We would want to test that we run fine in all of these environments:
- No WebGL support in a browser
- Available WebGL support, but no dedicated GPU available - various devices (per browser)
- Available WebGL support, with dedicated GPU - various devices (per browser)
- Mobile devices, as they are likely to have WebGL with only 16-bit floating point support.
Snippet testing hints
We are including a pre-trained model with this issue. We need to make sure the model is performing as expected. Best way to do that would be to use the snippet from adblockpluscore#25 and go to https://eyeo.gitlab.io/machine-learning/fb-test/index_ads.html and https://eyeo.gitlab.io/machine-learning/fb-test/index.html and investigate the number of miss-classifications. Another way to test would be to go directly on Facebook and see which posts are being hidden in a real world.
Model predictions can be different on mobile devices, as they are likely to have WebGL implementation with only 16-bit floating point support. Since snippets can run on Firefox on mobile we need to make sure predictions are as expected there separately.