Why is training always easy to over-fit?

When I used the PDBbind 2016 data set training model, I found that it was always easy to over-fit, and the best MSE of the training set was about 0.87, which was a very high number. I tried to adjust some of the model's hyperparameters, such as setting batch_size=16, dense_size=[1000,500], but almost every time it was trained to 20 epoch. When the model is over-fitted

The total training data is 12088, but the training 20 epoch is over-fitting. If my batch_size=16, can I understand that the model only learned 16*20=380 data is over-fitting? Can such a model get good prediction results?

This is the result of one of my training sets:

@marta-sd

Please give me some explanation, thank you very much.

Edited Sep 01, 2019 by Kyrie

Admin message

Why is training always easy to over-fit?