Allow parsing lists that are hosted in cloudflare
Environment
Ubuntu 18.04; Python 2.7.17
Background
In order to tackle ui#701, we decided to mirror one of the external filter lists using a combination subscription see internal#FT1588. But the generation of such combination list is throwing an error.
How to reproduce
- Download the following combination subscription list: https://hg.adblockplus.org/easylistcombinations/raw-file/45ba386628a1/i_dont_care_about_cookies.txt
- Run
flrender i_dont_care_about_cookies.txt rendered_list.txt
Observed behavior
We get a HTTP Error 403: Forbidden
Expected behavior
No errors should be thrown, and the file rendered_list.txt
should be properly created.
Further information
It looks like cloudflare is blocking any request made with a header 'User-Agent': 'Python-urllib...'
(default in our case) which makes us receive a 403 Error
when calling urllib2.urlopen(url)
(see call). If we modify such header,urlopen(Request(url, headers={'User-Agent' : 'Python'}))
, no errors are being thrown.
As GitLab is using cloudflare, we will get the same error if a list is being hosted there and if we try to use a combination list with it. ATM we haven't faced that case as GitHub has been the only chosen one by the list authors. Similar "Gitlab(cloudflare) + urllib" issues: here and here.