-
Multi-threading will only increase the parallel processing of the execution, but will not help us in real sense. I have one major suggestion on the approach you are following. Invoking the Firefox browser through Watir, and loading a heavy library into a light weighted script is unnecessary. Browser methods cane be slow and hamper the script. We don't need to interact with the browser to download audio files, we can directly parse the HTML content using GET API Calls to these urls. We have to keep the page number as the iterator and loop through all of the pages.
http://bbcsfx.acropolis.org.uk/?page=5
Also I noticed that all files are kind of big, mostly >20mbs. I think this approach will be faster. ` require 'net/http' require 'uri'
def downloadFile(filePath, url) uri = URI(URI.encode(url)) Net::HTTP.start(uri.host,uri.port) do |http| request = Net::HTTP::Get.new uri.path http.request request do |response| open filePath, 'w' do |io| response.read_body do |chunk| io.write chunk end end end
end enddownloadFile('/tmp/test.zip', "urlPath") `
https://gitlab.com/snippets/1944056
Some other suggestions -
- Don't declare the number of pages as a constant, rather get it dynamically from the main page.
-
File.exist?has been used twice, where we can only check it once. - This approach also looks good, need to test i though, if we can download in batches and then use parallel processing using multi threading to write in files. https://stackoverflow.com/questions/1120350/how-to-download-via-http-only-piece-of-big-file-with-ruby
Edited by Ayush Kalani
Please register or sign in to comment