add two optimizers if a pretrained network is used
save the pretrained part, use a smaller learning rate. See:
https://arxiv.org/abs/2202.07012
first: train the head, then when loss saturates: train all
Edited by fra-wa
save the pretrained part, use a smaller learning rate. See:
https://arxiv.org/abs/2202.07012
first: train the head, then when loss saturates: train all