Add support for machine learning based on TensorFlow.js
Background
For anti-circumvention we are looking into using a machine learning based approach to address some circumvention cases. We have a PoC where are able to use a ML model inside the extension to distinguish ads from non-ads from a snippet. To actually deploy something like this we need to have a supported way to run ML predictions within the extension. Having that, we can write snippets that utilize various machine learning models to make predictions about different elements on the page.
Underlying ML framework
Currently it looks like we could use TensorFlow.js as a ML framework of choice. It supports the most features and would be easiest to work with. Other options would be onnxjs, which uses the same architecture (WebGL with fallback to WebAssembly), but fewer features or webdnn.js, which also uses the same architecture, but also supports WebGPU, currently only available on Safari, and provides fewer features. With this in mind we would want to implement this new class on top of TensorFlow.js, with the thinking that it can be replaced, if needed.
What to change
Implement a new class called ML
which will be able to run prediction on machine learning models bundled within the extension. The class should expose functionality to:
- load/dispose a ML model and manage its lifetime
- accept inputs to ML model and respond with inference results
- expose ways to allow pre-processing of inputs before feeding them into models
- interact with a ML framework of choice
- allow selection of underlying backend to use
Each instance of the class should be responsible for running inference on only one ML model, and this class should manage the lifetime of the respective model.
The API should be as follows:
class ML
{
tf: object // TensorFlow.js library
modelURL: string // URL from where to load the model
initModel(): Promise // Initialize the model
predict(inputs: Array.<InputObject>, backend: string): Promise // Predict result
}
Where InputObject
has the following structure:
{
data: Array // A vector, matrix or a tensor to feed into the model
preprocess: Array.<PreprocessingObject> // Pre-processings to do passing to the model
}
Where PreprocessingObject
has the following structure:
{
funcName: String // Name of the pre-processing function
args: <Object> // Arguments to a pre-processing function
}
As it can be seen from the API, the class should support a number of pre-processing options before actually passing inputs to the model. It should be trivial to add a new pre-processing option in future.
Also, note that this class expects its users will provide a tf
object from TensorFlow.js, as it does not do any of the library management.
Model format
The class should work with models in tfjs_graph_model
format as tfjs-converter outputs them. The work flow should be: gather data -> train model -> save the model in one of the formats supported by tfjs-converter -> convert model using tfjs-coverter
, passing a parameter output-format=tfjs_graph_model
.
ie:
tfjs-converter --input_format=tf_saved_model --output_format=tfjs_graph_model --signature_name=serving_default ./trained_model/saved_model ./adblockpluscore/lib/data/trained_model
The command above should produce (at least) 2 files, one .json
for model description and (at least) one .bin
for model weights. All generated files should go into lib/data/modelName
directory within extension.
Hints for testers
Please see #98 (closed).