Working on the Red Team is all about testing defenses by simulating malicious activity. A good deal of actual malicious activity is performed by automated tools, or "bots". These bots might try things like guessing passwords for legitimate accounts, creating new accounts to abuse resources, or just crawling a website and scraping resources in search of secrets.
While working on a recent project, we came up against a fairly common situation - the site we were auditing was protected by Cloudflare's [Under Attack](https://support.cloudflare.com/hc/en-us/articles/200170076-Understanding-Cloudflare-Under-Attack-mode-advanced-DDOS-protection-) mode. This mode is meant to protect against distributed denial-of-service (DDoS) attacks. While we had no intention of conducting this type of attack, the mechanism itself still prevented our tooling from interacting with the website.
This write-up explains the behaviour we observed and the solution we used to quickly deploy automated tooling that bypasses this particular protection measure.
No zero-days or new vulnerabilities are being disclosed here, and there is nothing earth-shattering or particularly new. This is being shared simple because others may find it useful or interesting.
**Technical TL;DR**<br>
- First GET to an Under Attack enabled site responds 503 with a complex/obfuscated JavaScript math challenge.
- Browser POSTs the solution back to the site after a random delay of 1-5 seconds.
- Site responds with `Set-Cookie: cf_clearance=xxxx`.
- Subsequent requests with `cf_clearance` cookie are not challenged
- The User-Agent **must** be consistent between cookie acquisition and ongoing requests!
- To bypass, a headless browser can be used to solve the initial challenge and grab the cookie, which can then be used by standard libraries to perform further HTTP requests.
## Why not use an existing library?
A [quick search](https://duckduckgo.com/?q=cloudflare+scraping+library) will reveal that many libraries already exist to bypass the Cloudflare browser check.
After closely reviewing the source, we tried a few of these tools. They didn't work. The HTTP interactions we observed didn't match up with what the tools expected. Fixing them appeared to be a bit more work than tweaking some regular expressions.
Maybe our particular target had some unique qualities. Maybe Cloudflare had made some breaking changes recently, or maybe we were just doing something incredibly silly. Whatever the reason, this seemed like a good opportunity to take a step back and get a better understanding of exactly what we were up against.
## Dynamic Analysis
The first thing to understand was what happens when everything works.
You may have noticed that when you visit some sites, there is a slight delay while an image like the one below is displayed.

If you were proxying your browser traffic through something like [ZAP](https://owasp.org/www-project-zap/), you would see that at this point you receive an HTTP reply with headers like this:
What your browser does here is solve a math challenge buried inside that obfuscated JavaScript function, sleep for a few seconds, and then `POST` the response into the HTML form via the field named `jschl_answer`.
If the answer was correct, the web server will respond back with an `HTTP 200` including a header like the following:
```
`Set-Cookie: cf_clearance=xxxx`
```
And the page you originally intended to visit will be rendered. All subsequent requests you make to that domain will include the `cf_clearance` cookie and you will not be bothered with the process again.
## So how do we bypass it?
The existing libraries we've seen use things like complex regular expressions to extract the JavaScript and pass it off to third-party libraries to solve. This is great when it works, but the second Cloudflare changes a single character in their code everything breaks.
We had an idea to use a headless browser not for the entire scraping experience, but merely to grab the initial token and hand it back to our tooling. The benefit here is that it would adapt automatically to changes at Cloudflare and would minimize the overhead of a headless browser by using it for a maximum of three HTTP requests.
The internal tool we needed this for is written in Go, so we went looking for a compatible library to drive a headless browser, like Chromium. [chromedp](https://github.com/chromedp/chromedp) looked robust, actively maintained, and was actually recommended by other libraries which had been retired.
We wrote a quick function using chromedp that solved the initial challenge and returned the required cookie. From there, we simply included the cookie in all our `net/http` requests.
That function is available to use in a package located [here](https://gitlab.com/gitlab-com/gl-security/security-operations/gl-redteam/cfclearance). If you find it useful or have ideas, please let us know via an issue there. Thanks!