A curated list (+17.600) of IP addresses identified as bots, scrapers, and malicious actors. Updated every week.
Start to protect your content!
This blacklist is generated by a proprietary Bot Detection System that monitors and analyzes web traffic from a panel of major European publishers, processing approximately 500 million page views per month on multiple domains.
The Bot Detector source code, detection rules, and behavioral patterns are intentionally kept private.
If we published the detection algorithms and scoring rules, malicious bots could easily analyze them and implement countermeasures to evade detection. By keeping the detection logic confidential, we maintain the effectiveness of the system against sophisticated bot operators who continuously adapt their techniques.
What we share publicly:
- The resulting blacklist of detected malicious IPs
- Multiple formats for easy integration with your infrastructure
- Statistics and metadata about detected threats
What remains private:
- Detection algorithms and scoring logic
- Behavioral analysis rules
- Pattern matching configurations
- Traffic analysis methodologies
The Bot Detector system analyzes traffic patterns and behaviors to identify:
These bots explicitly identify themselves via User-Agent:
| Category | Description |
|---|---|
| Known Bot (AI Crawler) | GPTBot, ClaudeBot, ChatGPT-User, PerplexityBot, Bytespider |
| Known Bot (SEO) | SemrushBot, AhrefsBot, MJ12bot, DotBot, DataForSeoBot |
| Known Bot (Scraper) | CCBot, Scrapy, Diffbot, news-please |
| Known Bot (Search Engine) | Yandex, Sogou, Baidu, PetalBot |
These are detected through traffic analysis without explicit identification:
| Category | Description |
|---|---|
| Vulnerability Scanner | IPs probing for .env, wp-admin, phpinfo, credentials |
| Content Theft | Illegal scraping with burst patterns on pagination/archives |
| Archive Scraper | Systematic scraping of old articles via deep pagination |
| Image Scraper | High percentage of image requests without referrers |
| Aggressive Scanner | Very high request volume (>80 RPM) |
| DDoS Source | Extremely high request volume (>200 RPM) |
| Category | Description |
|---|---|
| Proxy/VPN Abuse | Traffic from detected proxy/VPN services |
| Hosting/Cloud Bot | Automated traffic from cloud/hosting infrastructure |
Contains full metadata including score, country, organization, category, and reason.
curl -O https://raw.githubusercontent.com/lula73/bot-detector/master/blacklist.json# Download
curl -O https://raw.githubusercontent.com/lula73/bot-detector/master/nginx/deny.conf
# Include in your nginx.conf or site config
include /etc/nginx/deny.conf;
# Reload nginx
sudo nginx -t && sudo nginx -s reload# Download
curl -O https://raw.githubusercontent.com/lula73/bot-detector/master/apache/.htaccess
# Include in your httpd.conf or use directly in web root
# Apache 2.4+ required# Download and execute
curl -O https://raw.githubusercontent.com/lula73/bot-detector/master/iptables/rules.sh
chmod +x rules.sh
sudo ./rules.sh- Download
cloudflare/ip_list.txt - Go to Cloudflare Dashboard > Security > WAF > Tools
- Use "IP Access Rules" to bulk import
// In your theme's functions.php or custom plugin
require_once('/path/to/blocked_ips.php');Features of WordPress integration:
- Automatic IP detection (supports Cloudflare, proxies)
- Returns 403 Forbidden with custom headers
- Runs before any output (priority 1)
# In haproxy.cfg
frontend web_frontend
acl blacklist src -f /etc/haproxy/blacklist.list
http-request deny if blacklist# Nginx example
0 */3 * * * curl -s https://raw.githubusercontent.com/lula73/bot-detector/master/nginx/deny.conf -o /etc/nginx/deny.conf && nginx -s reload
# iptables example
0 */3 * * * curl -s https://raw.githubusercontent.com/lula73/bot-detector/master/iptables/rules.sh | sudo bashCreate /etc/systemd/system/update-blacklist.service:
[Unit]
Description=Update Bot Detector Blacklist
[Service]
Type=oneshot
ExecStart=/usr/bin/curl -s https://raw.githubusercontent.com/lula73/bot-detector/master/nginx/deny.conf -o /etc/nginx/deny.conf
ExecStartPost=/usr/sbin/nginx -s reloadCreate /etc/systemd/system/update-blacklist.timer:
[Unit]
Description=Update blacklist every 3 hours
[Timer]
OnCalendar=*-*-* 0/3:00:00
Persistent=true
[Install]
WantedBy=timers.targetEnable: sudo systemctl enable --now update-blacklist.timer
Check the stats/ directory for:
categories.json- Breakdown by bot categorycountries.json- Geographic distribution
You can also fetch the JSON directly in your application:
import requests
response = requests.get('https://raw.githubusercontent.com/lula73/bot-detector/master/blacklist.json')
data = response.json()
for entry in data['blacklist']:
print(f"{entry['ip']} - {entry['category']} - {entry['reason']}")Each entry in blacklist.json contains:
| Field | Type | Description |
|---|---|---|
ip |
string | IP address |
score |
integer | Bot score (0-100) |
country |
string | Country code (ISO 3166-1 alpha-2) |
organization |
string | ISP/hosting provider |
category |
string | Bot category |
reason |
string | Human-readable block reason |
first_seen |
string | First detection date (YYYY-MM-DD) |
last_seen |
string | Last detection date (YYYY-MM-DD) |
scan_count |
integer | Number of scans/detections |
is_permanent |
boolean | Permanently blocked |
Found a false positive? Please open an issue with:
- The IP address
- Your use case
- Any relevant logs
MIT License - See LICENSE file.
This blacklist is provided as-is. False positives may occur. Always test in a staging environment before deploying to production.
Not affiliated with any of the bot operators mentioned.
Update Frequency: Every 3 hours
Source: Bot Detector v2.4
Last Generated: Check metadata.generated_at in blacklist.json
