Skip to content

Problem replicating YoloV3 result (COCO dataset) with training from scratch

I am trying to train COCO Dataset with YoloV3 from scratch. What I did was

  1. Take cfg/yolo.py files and make necessary changes for YoloV3
  • Change loss function from RegionLoss to MultiScaleRegionLoss
  • Change GetBoundingBoxes to GetMultiScaleBoundingBoxes
  • Make COCO dataset based on the provided labels.py
  1. Apply a test function against the COCO valid data during training
  • Basically, I extended train.py by applying a test function modified from the provided test.py
  • Then, I provided a second dataloader (COCO validation dataset dataloader) to the TrainEngine, and do something like
    @ln.engine.Engine.batch_end(5000)
    def test_against_valid_data(self):
        test_fn(self)
  • However, since I am not really accustom to the hyperparameters for Yolo training (I am more comfortable with training object recognition network), I just use the one set in the train.py

However, after long training, the test against COCO valid dataset stuck at NaN. Since I thought there are bugs in the code, I tried to train again from the start but with the existing weight loaded into the model (yolov3-coco.pt). This gives a non-Nan test result (which should mean there is no bug?).

So, what I should do to reproduce training COCO from scratch with YoloV3?

I also included the training graph screenshot here. result

  • The blue one is result of training from scratch (I replaced NaN in test_mAP with 0)
  • The red one is result of training from yolov3-coco.pt

Once again, thank you for the awesome library (including brambox)!