BBC sound effects ($1944038) · Snippets

Multi-threading will only increase the parallel processing of the execution, but will not help us in real sense. I have one major suggestion on the approach you are following. Invoking the Firefox browser through Watir, and loading a heavy library into a light weighted script is unnecessary. Browser methods cane be slow and hamper the script. We don't need to interact with the browser to download audio files, we can directly parse the HTML content using GET API Calls to these urls. We have to keep the page number as the iterator and loop through all of the pages.

http://bbcsfx.acropolis.org.uk/?page=5

Also I noticed that all files are kind of big, mostly >20mbs. I think this approach will be faster. ` require 'net/http' require 'uri'

downloadFile('/tmp/test.zip', "urlPath") `

https://gitlab.com/snippets/1944056

Some other suggestions -

Don't declare the number of pages as a constant, rather get it dynamically from the main page.
File.exist? has been used twice, where we can only check it once.
This approach also looks good, need to test i though, if we can download in batches and then use parallel processing using multi threading to write in files. https://stackoverflow.com/questions/1120350/how-to-download-via-http-only-piece-of-big-file-with-ruby