add two optimizers if a pretrained network is used

save the pretrained part, use a smaller learning rate. See:

https://arxiv.org/abs/2202.07012

first: train the head, then when loss saturates: train all

Edited May 19, 2022 by fra-wa

Assignee Loading

Time tracking Loading