Our crawler


We are crawling the web using the user agent Barkrowler/0.9.

We respect the robots.txt instructions.

We are crawling the web in order to measure it by calculating some helpfull metrics (popularity, trust, categorization).

Barcrawler has no IP range.


The crawler is hitting my web server too frequently, how to make it slow down ?

We have a hybrid per IP/per HOST/per DOMAIN crawl delay policy, so depending on the number of pages/hosts/domains hosted on the same webserver, you will experience various levels of crawl frequency.
The easiest way to make our crawler slow down is to enable a rate limit policy using HTTP 429 codes.
On apache : mod_ratelimit http://httpd.apache.org/docs/2.4/fr/mod/mod_ratelimit.html,
on nginx: https://www.nginx.com/blog/rate-limiting-nginx/,
on haproxy : https://www.haproxy.com/fr/blog/four-examples-of-haproxy-rate-limiting/