Inconsistent test.py results; P, R, mAP metrics all score (perfect) 1

Created by: cvanlit-supplai

❔What is happening with my test.py results?

After each epoch, the model is tested using the code in test.py. After the last epoch, some plots are saved, including the P-R curve, F1-Curve, confusion matrix, etc.. These plots look realistic to me. On the other hand, when I run

python test.py --img 1280 --conf-thres 0.7 --iou-thres 0.7 --task test --batch-size 2 --device 0 \
--weights runs/MyProject/exp/weights/best.pt

(or using last.pt and --task val for that matter) afterwards, the results look strange. Precision and Recall are 1 for all classes, no matter what conf-thres and iou-thres I use (even 0.99 and 0.01), and the confusion matrix has 1.00's all along the diagonal axis. Something is obviously going wrong. Any ideas?

I have included images of the after-last-epoch test and the separate run of test.py below.

Additional context

After-training confusion matrix, as I would expect it to look:

When running test.py separately:

Command line output after running test.py: