Geo file backfill retries the same failing files over and over
Summary
As seen on the Geo testbed. Number of files synced is currently stalled. this is because the queue is clogged with files it it just retrying over and over.
Steps to reproduce
$ grep FileDownloadService geo.log | grep 673009
{"severity":"INFO","time":"2017-10-09T13:40:04.765Z","class":"Geo::FileDownloadService","object_type":"file","object_db_id":673009,"message":"File download","success":false,"bytes_downloaded":-1,"download_time_s":0.245}
{"severity":"INFO","time":"2017-10-09T13:43:08.230Z","class":"Geo::FileDownloadService","object_type":"file","object_db_id":673009,"message":"File download","success":false,"bytes_downloaded":-1,"download_time_s":0.223}
{"severity":"INFO","time":"2017-10-09T13:46:25.632Z","class":"Geo::FileDownloadService","object_type":"file","object_db_id":673009,"message":"File download","success":false,"bytes_downloaded":-1,"download_time_s":0.218}
{"severity":"INFO","time":"2017-10-09T13:49:38.675Z","class":"Geo::FileDownloadService","object_type":"file","object_db_id":673009,"message":"File download","success":false,"bytes_downloaded":-1,"download_time_s":0.217}
{"severity":"INFO","time":"2017-10-09T13:52:48.653Z","class":"Geo::FileDownloadService","object_type":"file","object_db_id":673009,"message":"File download","success":false,"bytes_downloaded":-1,"download_time_s":0.239}
{"severity":"INFO","time":"2017-10-09T13:56:03.982Z","class":"Geo::FileDownloadService","object_type":"file","object_db_id":673009,"message":"File download","success":false,"bytes_downloaded":-1,"download_time_s":0.76}
{"severity":"INFO","time":"2017-10-09T13:59:18.327Z","class":"Geo::FileDownloadService","object_type":"file","object_db_id":673009,"message":"File download","success":false,"bytes_downloaded":-1,"download_time_s":0.235}
{"severity":"INFO","time":"2017-10-09T14:02:18.669Z","class":"Geo::FileDownloadService","object_type":"file","object_db_id":673009,"message":"File download","success":false,"bytes_downloaded":-1,"download_time_s":0.108}
{"severity":"INFO","time":"2017-10-09T14:05:22.814Z","class":"Geo::FileDownloadService","object_type":"file","object_db_id":673009,"message":"File download","success":false,"bytes_downloaded":-1,"download_time_s":0.205}
Geo::FileRegistry.where(file_id:673009).first
=> nil
What is the current bug behavior?
The same files are being retried over and over, and consistently failing.
If the number of consistently failing files is larger than the database batch size, then the backfill process stalls completely.
What is the expected correct behavior?
When a download fails, it should not be retried again until some time later.
Possible fixes
I think the issue is that no Geo::FileRegistry
is being created in the failure case. I believe this is how we stop backfill from doing the same with projects. I'll see if I can confirm that.
Edited by Nick Thomas