Add support for machine learning based on TensorFlow.js

Background

For anti-circumvention we are looking into using a machine learning based approach to address some circumvention cases. We have a PoC where are able to use a ML model inside the extension to distinguish ads from non-ads from a snippet. To actually deploy something like this we need to have a supported way to run ML predictions within the extension. Having that, we can write snippets that utilize various machine learning models to make predictions about different elements on the page.

Underlying ML framework

Currently it looks like we could use TensorFlow.js as a ML framework of choice. It supports the most features and would be easiest to work with. Other options would be onnxjs, which uses the same architecture (WebGL with fallback to WebAssembly), but fewer features or webdnn.js, which also uses the same architecture, but also supports WebGPU, currently only available on Safari, and provides fewer features. With this in mind we would want to implement this new class on top of TensorFlow.js, with the thinking that it can be replaced, if needed.

What to change

Implement a new class called ML which will be able to run prediction on machine learning models bundled within the extension. The class should expose functionality to:

load/dispose a ML model and manage its lifetime
accept inputs to ML model and respond with inference results
expose ways to allow pre-processing of inputs before feeding them into models
interact with a ML framework of choice
allow selection of underlying backend to use

Each instance of the class should be responsible for running inference on only one ML model, and this class should manage the lifetime of the respective model.

The API should be as follows:

class ML
{
  tf: object                                                       // TensorFlow.js library
  modelURL: string                                                 // URL from where to load the model
  initModel(): Promise                                             // Initialize the model
  predict(inputs: Array.<InputObject>, backend: string): Promise   // Predict result
}

Where InputObject has the following structure:

{
  data: Array                                      // A vector, matrix or a tensor to feed into the model
  preprocess: Array.<PreprocessingObject>          // Pre-processings to do passing to the model
}

Where PreprocessingObject has the following structure:

{
  funcName: String                                 // Name of the pre-processing function
  args: <Object>                                   // Arguments to a pre-processing function
}

As it can be seen from the API, the class should support a number of pre-processing options before actually passing inputs to the model. It should be trivial to add a new pre-processing option in future.

Also, note that this class expects its users will provide a tf object from TensorFlow.js, as it does not do any of the library management.

Model format

The class should work with models in tfjs_graph_model format as tfjs-converter outputs them. The work flow should be: gather data -> train model -> save the model in one of the formats supported by tfjs-converter -> convert model using tfjs-coverter, passing a parameter output-format=tfjs_graph_model. ie:

tfjs-converter --input_format=tf_saved_model --output_format=tfjs_graph_model --signature_name=serving_default ./trained_model/saved_model ./adblockpluscore/lib/data/trained_model

The command above should produce (at least) 2 files, one .json for model description and (at least) one .bin for model weights. All generated files should go into lib/data/modelName directory within extension.

Hints for testers

Please see #98 (closed).

Edited Dec 09, 2019 by Manish Jethani

Admin message