├── LICENSE ├── README.md ├── aliyun-yd.rules ├── bad-crawler.rules ├── basic-crawler.rules ├── good-bot.rules └── security-scan-bot.rules /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Sukka 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

2 | 3 |

4 |

Cloudflare Block Bad Bot Ruleset

5 | 6 |

7 | Author 8 | License 9 |

10 | 11 | > Block bad, possibly even malicious web crawlers (automated bots) using Cloudflare Firewall Rules
12 | > 使用 Cloudflare Firewall Rules 拦截恶意网络爬虫(自动机器人)和其它恶意流量 13 | 14 | ## Introduction 简介 15 | 16 | `Cloudflare Block Bad Bot Ruleset` projects stop and block Bad Bot, Spam Referrer, Adware, Malware and any other kinds of bad internet traffic ever reaching your web sites. Inspired by [nginx-badbot-blocker](https://github.com/mariusv/nginx-badbot-blocker) & worked with Cloudflare Firewall Rules. 17 | 18 | `Cloudflare Block Bad Bot Ruleset` 可以阻止恶意爬虫、垃圾引荐来源、广告、恶意软件以及任何其他类型的恶意互联网流量到达您的网站。灵感来自 [nginx-badbot-blocker](https://github.com/mariusv/nginx-badbot-blocker) 并与 Cloudflare Firewall Rules 搭配使用。 19 | 20 | ## Precautions 注意事项 21 | 22 | `Cloudflare Block Bad Bot Ruleset` mainly based on User-Agent, which is known to all that could be changed easily. So the project can not replace the Web Application Firewall. 23 | 24 | `Cloudflare Block Bad Bot Ruleset` 主要基于 User-Agent,但是众所周知 User-Agent 可以伪装,所以本项目并不能取代正规的 Web Application Firewall。 25 | 26 | ## Ruleset 规则 27 | 28 | Rule Name | File Name | Action | What For 29 | ---- | ---- | ---- | ---- 30 | Good Bot | [good-bot.rules](./good-bot.rules) | Allow | Match known good bot.
匹配已知的正常爬虫 31 | Aliyun Yundun | [aliyun-yd.rules](./aliyun-yd.rules) | Block | Match Aliyun Yundun based on known IP cidr.
基于已知 IP 段匹配阿里云盾 32 | Basic Crawler | [basic-crawler.rules](./basic-crawler.rules) | Block/Challenge | Block some known bad bot.
匹配一些基本的 HTTP Request 库 33 | Bad Crawler | [bad-crawler.rules](./bad-crawler.rules) | Block/Challenge | Match mostly known bad bot, basic ruleset not included.
匹配绝大部分已知的恶意爬虫、SEO 爬虫和营销爬虫 34 | Security Scanner | [security-scan-bot.rules](./security-scan-bot.rules) | Block/Challenge | Match mostly known security scanner.
匹配大部分已知的漏洞扫描爬虫 35 | 36 | ## Usage 用法 37 | 38 | ![](https://i.loli.net/2018/10/30/5bd801833e8d3.png) 39 | 40 | ## More Information 更多详情 41 | 42 | - [Announcing Firewall Rules | Cloudflare Blog](https://blog.cloudflare.com/announcing-firewall-rules/) 43 | - [Cloudflare Firewall Rules | Cloudflare Documentations](https://developers.cloudflare.com/firewall/) 44 | - [nginx-badbot-blocker | GitHub](https://github.com/mariusv/nginx-badbot-blocker) 45 | - [nginx-ultimate-bad-bot-blocker | GitHub](https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker) 46 | 47 | ## Todo List 48 | 49 | - [ ] Bad referrer list 50 | - [ ] Known bad IP List 51 | 52 | ## Maintainer 维护者 53 | 54 | **Cloudflare Block Bad Bot Ruleset** © [Sukka](https://github.com/SukkaW), Released under the [MIT](./LICENSE) License. 55 | 56 | > [Personal Website](https://skk.moe) · [Blog](https://blog.skk.moe) · GitHub [@SukkaW](https://github.com/SukkaW) · Telegram Channel [@SukkaChannel](https://t.me/SukkaChannel) · Twitter [@isukkaw](https://twitter.com/isukkaw) · Keybase [@sukka](https://keybase.io/sukka) 57 | -------------------------------------------------------------------------------- /aliyun-yd.rules: -------------------------------------------------------------------------------- 1 | (ip.src in {140.205.201.0/24 140.205.225.0/24 106.11.222.0/23 106.11.224.0/24 106.11.228.0/22}) 2 | -------------------------------------------------------------------------------- /bad-crawler.rules: -------------------------------------------------------------------------------- 1 | (http.user_agent contains "80legs") or 2 | (http.user_agent contains "Abonti") or 3 | (http.user_agent contains "admantx") or 4 | (http.user_agent contains "aipbot") or 5 | (http.user_agent contains "AllSubmitter") or 6 | (http.user_agent contains "Backlink") or (http.user_agent contains "backlink") or 7 | (http.user_agent contains "Badass") or 8 | (http.user_agent contains "Bigfoot") or 9 | (http.user_agent contains "blexbot") or 10 | (http.user_agent contains "Buddy") or 11 | (http.user_agent contains "CherryPicker") or 12 | (http.user_agent contains "cloudsystemnetwork") or 13 | (http.user_agent contains "cognitiveseo") or 14 | (http.user_agent contains "Collector") or 15 | (http.user_agent contains "cosmos") or 16 | (http.user_agent contains "CrazyWebCrawler") or 17 | (http.user_agent contains "Crescent") or 18 | (http.user_agent contains "Devil") or 19 | ((lower(http.user_agent) contains "domain") and ((http.user_agent contains "spider") or (http.user_agent contains "stat") or (http.user_agent contains "Appender") or (http.user_agent contains "Crawler"))) or 20 | (http.user_agent contains "DittoSpyder") or 21 | (http.user_agent contains "Konqueror") or 22 | (http.user_agent contains "Easou") or (http.user_agent contains "Yisou") or (http.user_agent contains "Etao") or 23 | (http.user_agent contains "mail" and ((http.user_agent contains "olf") or (http.user_agent contains "spider"))) or 24 | (http.user_agent contains "exabot.com") or 25 | (http.user_agent contains "getintent") or 26 | (http.user_agent contains "Grabber") or 27 | (http.user_agent contains "GrabNet") or 28 | (http.user_agent contains "HEADMasterSEO") or 29 | (http.user_agent contains "heritrix") or 30 | (http.user_agent contains "htmlparser") or 31 | (http.user_agent contains "hubspot") or 32 | (http.user_agent contains "Jyxobot") or 33 | (lower(http.user_agent) contains "kraken") or 34 | (http.user_agent contains "larbin") or 35 | (http.user_agent contains "ltx71") or 36 | (http.user_agent contains "leiki") or 37 | (http.user_agent contains "LinkScan") or 38 | (http.user_agent contains "Magnet") or 39 | (http.user_agent contains "Mag-Net") or 40 | (http.user_agent contains "Mechanize") or 41 | (http.user_agent contains "MegaIndex") or 42 | (http.user_agent contains "Metasearch") or 43 | (http.user_agent contains "MJ12bot") or 44 | (http.user_agent contains "moz.com") or 45 | (http.user_agent contains "Navroad") or 46 | (http.user_agent contains "Netcraft") or 47 | (http.user_agent contains "niki-bot") or 48 | (http.user_agent contains "NimbleCrawler") or 49 | (http.user_agent contains "Nimbostratus") or 50 | (http.user_agent contains "Ninja") or 51 | (http.user_agent contains "Openfind") or 52 | (http.user_agent contains "Page" and http.user_agent contains "Analyzer") or 53 | (http.user_agent contains "Pixray") or 54 | (http.user_agent contains "probethenet") or 55 | (http.user_agent contains "proximic") or 56 | (http.user_agent contains "psbot") or 57 | (http.user_agent contains "RankActive") or (http.user_agent contains "RankingBot") or (http.user_agent contains "RankurBot") or 58 | (http.user_agent contains "Reaper") or 59 | (http.user_agent contains "SalesIntelligent") or 60 | (http.user_agent contains "Semrush") or 61 | (http.user_agent contains "SEOkicks") or 62 | (http.user_agent contains "spbot") or 63 | (http.user_agent contains "SEOstats") or 64 | (http.user_agent contains "Snapbot") or 65 | (http.user_agent contains "Stripper") or 66 | (http.user_agent contains "Siteimprove") or 67 | (http.user_agent contains "sitesell") or 68 | (http.user_agent contains "Siphon") or 69 | (http.user_agent contains "Sucker") or 70 | (http.user_agent contains "TenFourFox") or 71 | (http.user_agent contains "TurnitinBot") or 72 | (http.user_agent contains "trendiction") or 73 | (http.user_agent contains "twingly") or 74 | (http.user_agent contains "VidibleScraper") or 75 | (http.user_agent contains "WebLeacher") or 76 | (http.user_agent contains "WebmasterWorldForum") or 77 | (http.user_agent contains "webmeup") or 78 | (http.user_agent contains "Webster") or 79 | (http.user_agent contains "Widow") or 80 | (http.user_agent contains "Xaldon") or 81 | (http.user_agent contains "Xenu") or 82 | (http.user_agent contains "xtractor") or 83 | (http.user_agent contains "Zermelo") 84 | -------------------------------------------------------------------------------- /basic-crawler.rules: -------------------------------------------------------------------------------- 1 | (lower(http.user_agent) contains "fuck") or 2 | (http.user_agent contains "lient" and http.user_agent contains "ttp") or 3 | (lower(http.user_agent) contains "java") or 4 | (http.user_agent contains "Joomla") or 5 | (http.user_agent contains "libweb") or (http.user_agent contains "libwww") or 6 | (http.user_agent contains "PHPCrawl") or 7 | (http.user_agent contains "PyCurl") or 8 | (lower(http.user_agent) contains "python") or 9 | (http.user_agent contains "wrk") or 10 | (http.user_agent contains "hey/") -------------------------------------------------------------------------------- /good-bot.rules: -------------------------------------------------------------------------------- 1 | (cf.client.bot) or 2 | (http.user_agent contains "duckduckgo") or 3 | (http.user_agent contains "facebookexternalhit") or 4 | (http.user_agent contains "Feedfetcher-Google") or 5 | (http.user_agent contains "LinkedInBot") or 6 | (http.user_agent contains "Mediapartners-Google") or 7 | (http.user_agent contains "msnbot") or 8 | (http.user_agent contains "Slackbot") or 9 | (http.user_agent contains "TwitterBot") or 10 | (http.user_agent contains "ia_archive") or 11 | (http.user_agent contains "yahoo") 12 | -------------------------------------------------------------------------------- /security-scan-bot.rules: -------------------------------------------------------------------------------- 1 | (http.user_agent contains "Acunetix") or 2 | (lower(http.user_agent) contains "apache") or 3 | (http.user_agent contains "BackDoorBot") or 4 | (http.user_agent contains "cobion") or 5 | (http.user_agent contains "masscan") or (http.user_agent contains "FHscan") or (http.user_agent contains "scanbot") or (http.user_agent contains "Gscan") or (http.user_agent contains "Researchscan") or (http.user_agent contains "WPScan") or (http.user_agent contains "ScanAlert") or (http.user_agent contains "Wprecon") or 6 | (lower(http.user_agent) contains "virusdie") or 7 | (http.user_agent contains "VoidEYE") or 8 | (http.user_agent contains "WebShag") or 9 | (http.user_agent contains "Zeus") or 10 | (http.user_agent contains "zgrab") or 11 | (lower(http.user_agent) contains "zmap") or (lower(http.user_agent) contains "nmap") or (lower(http.user_agent) contains "fimap") or 12 | (http.user_agent contains "ZmEu") or 13 | (http.user_agent contains "ZumBot") or 14 | (http.user_agent contains "Zyborg") 15 | --------------------------------------------------------------------------------