Skip to content

Model quantization and other optimizations

I'm trying to implement an end-end training/testing setup for my native language. First, I'll start from a pretrained model and finetune it with my custom dataset. I have most of the required stuff ready, but I couldn't find where exactly the model quantization is implemented. In the model download links I can see there are quantized .tflite files.

Could you please point me to the relevant script for this? Also, have you experimented with pruning these models to further decrease model size?