Cloudflare-IUAM-Solver
A simple "Pure Java" library and cli tool to breaking through the Cloudflare's anti-bot mechanism (a.k.a "I'm Under Attack Mode", or IUAM), implemented with HTMLUnit.
Prerequisites
- JDK 11
CLI Tool
Install
$ curl -LO https://github.com/ninja-beans/cloudflare-iuam-solver/releases/download/0.1.0/cfis
$ chmod +x cfis
Usage
Print a cookie string.
$ ./cfis -c https://www.example.com
cf_clearance=XXXXXXXXXXXXXXXXXXXX-XXXXXXXXXX-X-XXX;__cfduid=XXXXXXXXXXXXXXXXXXXX;
Download a html content with curl.
$ ./cfis -c > cookie.txt
$ ./cfis -u > ua.txt
$ curl -s --cookie "$(cat cookie.txt)" -A "$(cat ua.txt)" https://www.example.com/
Extract all images with curl and xmllint.
$ eval $(./cfis --curl https://www.example.com/) | xmllint --xpath "//img" --html - 2> /dev/null
Java Library
Install
<dependency>
<groupId>com.ninja-beans.crawler</groupId>
<artifactId>cloudflare-iuam-solver-parent</artifactId>
<version>0.1.0</version>
<type>pom</type>
</dependency>
Usage
Scraping with Java 11 HttpClient and Jsoup.
public class App {
public static void main(final String[] args) throws IOException, InterruptedException {
var url = args[0];
var result = IuamSolver.solve(url);
// 1. Create HttpClient
var client = HttpClient
.newBuilder()
.version(Version.HTTP_1_1)
.followRedirects(Redirect.NORMAL)
.cookieHandler(result.getCookieManager()).build();
// 2. Send the request and get the response
var request = HttpRequest.newBuilder().header("Accept", "*/*")
.header("User-Agent", result.getResponse().getUserAgent())
.GET()
.uri(URI.create(url))
.build();
var response = client.send(request, BodyHandlers.ofString(StandardCharsets.UTF_8));
// 3. Parse the response
var doc = Jsoup.parse(response.body(), url);
var elm = doc.getElementById("title");
System.out.println(doc.title());
System.out.println(elm.html());
}
}