A few months ago, I noticed my Coywolf News articles were being translated into French and republished by a French news site. I saw it because their versions were competing with my original versions in Google News. I observed that each time an article would be published on my site, minutes later, it would be fully translated and republished with the same images.
I investigated what the French site was doing and discovered that they were hitting my RSS feed every minute looking for newly published articles. When a new article would appear, they would then visit and copy everything from the post, automatically translate the article, and publish it. It was a clever scheme, and one that isn’t new to spammers, but it was troublesome since Google News displayed the posts in their results.
I was able to put a stop to it using Cloudflare’s Firewall. That process inevitably sent me down the rabbit hole of what else I could find and stop with their service, and what I found surprised me. In every case, including the recent site I set up on my home network to test out Cloudflare’s Automatic Platform Optimization, I discovered nefarious traffic that was up to no good. This article highlights how to find bad traffic and permanently stop it using Cloudflare’s Firewall.
Table of contents
Cloudflare versus Securi or Wordfence
I’ve never used Securi or Wordfence. While I have plenty of industry colleagues that use and swear by them, I’ve never felt the need to use either of them for my sites.
My reluctance to use Securi and Wordfence comes from the same hesitation I have to using JetPack. Simply put, I don’t want to add yet another plugin, let alone a plugin with such a large footprint on the site.
I prefer to offload optimized delivery and security onto a CDN or edge server, which for me is Cloudflare. The only exception to that preference is running a caching plugin like WP Rocket, but as Cloudflare continues to evolve, I’m now reconsidering the need for that.
Lastly, if Cloudflare didn’t exist, then I would seriously consider using Securi or Wordfence. They are both useful and offer some features that Cloudflare doesn’t. But as a whole, Cloudflare checks off more boxes that are important to me.
Methods for finding bad actors
Site analytics like GA and Fathom only tell you about traffic from pages with tracking code on them. Most of the nefarious traffic that hits a site uses bots and visits pages that don’t have any tracking code. These are some of the best ways to discover otherwise hidden traffic.
Most of us rarely, if ever, look at our log files unless there’s an issue with crawling or page errors we’re trying to debug. But they exist and sit there on the server, just waiting for us to give them our attention.
One of the most challenging parts of working with log files can be finding them because hosting providers store them in different places. For example, Pair Networks makes them accessible via SFTP (SSH) in a folder named
www_logs, while WP Engine makes them available via its User Portal.
If you don’t know where your log files are stored, search your hosting provider’s knowledge base or contact their support. Once you find them, you won’t need to download them all. Log files are typically saved once each day, and the timestamp is used as part of the file name. Choose the files that match the timeframe you’re investigating, and then download them to your computer.
The next step is to view and sort the data. You will need a log analyzer, and I highly recommend using Screaming Frog’s Log File Analyser. It’s not free, but it’s the best and least frustrating one out there. I think it’s well worth the money if you want to get answers quickly and save yourself some time.
Using Screaming Frog Log File Analyser
By default, the Log File Analyser (LFA) only displays visit details from well-known bots. Since you’re not troubleshooting crawl issues – you’re looking for bad actors – you’ll need to turn off User Agent filtering. This can only be done when you create a new project.
On the new project window, click on the User Agents tab and then uncheck Filter User Agents. That will force the app to process and display every site visit.
After the LFA has processed the log files, go to the IPs tab. The table should be sorted by the number of events. If it’s not, click on the Num Events column header until the rows are sorted by highest to lowest.
Next, click on individual IPs in the Remote Host column. If you see the IP 127.0.0.1 you can ignore it because that’s the
localhost IP that the server uses for processes like cronjobs. After clicking on an IP, click on the Events tab located on the window’s bottom-left corner. That will display all of the pages the IP attempted to visit.
What you’re looking for are pages that nobody other than yourself should be attempting to access. If you notice an IP address – one that is not an IP from your home or work – trying to access pages like
xmlrpc.php, and pages for plugins that don’t exist on your site, then that IP is likely up to no good. Make a note of each offending IP because you will need them later when you set up Cloudflare’s Firewall rules.
Using Cloudflare’s Analytics
If you have a paid plan (Pro or higher) with Cloudflare, you can use their new Analytics tool to spot bad actors. Cloudflare’s Analytics isn’t granular like access logs, but they do surface traffic for pages that don’t have any tracking code. That makes it possible to spot nefarious traffic that may otherwise be missed with a log analyzer or regular site analytics.
I was able to use Cloudflare’s Analytics to quickly pinpoint the pages the French spammer was visiting. Using that information, I was then able to observe and ultimately block them using Cloudflare’s Firewall.
Methods for stopping bad actors
When I first started down the journey of learning how to detect and stop unwanted traffic, I took a blunt approach. And by blunt, I mean I blocked entire countries from accessing my site so I could quickly gather data and have more precise details. It proved to be very useful in stopping the French spammer, but it also, as one would expect, blocked legitimate traffic.
Cloudflare’s Firewall allows you to take a blunt or surgical approach to block bad actors. Here’s how to do it.
Blocking or Challenging traffic in Cloudflare’s Firewall
Cloudflare provides two primary tools for adding firewall rules. They have a tool called IP Access Rules and another one called Firewall Rules.
- IP Access Rules: This allows you to add up to 50,000 rules for the entire account but is restricted to adding an individual IP, IP range, Autonomous System Number (ASN), or country.
- Firewall Rules: This allows you to add formulaic (AND/OR) rules based on a multitude of variables, including IPs, cookies, Uniform Resource Identifiers (URIs), request methods, and several more. Unlike IP Access Rules, the number of Firewall Rules that can be created are limited and are dependent on the Cloudflare plan for each domain.
The Firewall Rules interface is more user friendly than the IP Access Rules. I recommend creating your first rules there to quickly get acquainted with how they work. Afterward, if you plan to make several rules based primarily on IPs, ASNs, or countries, I recommend switching to the simpler IP Access Rules tool. For brevity in this article, I’m going to focus solely on the IP Access Rules tool.
Choosing what to block
What you block depends on the type and severity of unwanted visits. When I was investigating how the French spammer was accessing my site, Cloudflare’s Analytics made it evident from a high level. It showed that the site was being hit by traffic from France almost every minute, and the page it was primarily accessing was the RSS feed. That’s what prompted me to block traffic from the entire country of France.
After I blocked France, Cloudflare’s Firewall Overview immediately started to display fine-grain details of the IPs, user agents, and pages being visited from France. Using that data, I experimented with blocking individual IPs and user agents. Ultimately, the rules I stuck with were a general block of an ASN and two IP addresses. That stopped all of the bots the spammer was using to monitor, steal, and republish my content automatically.
If you’re not familiar with Autonomous System Numbers, the American Registry for Internet Numbers (ARIN) describes them as:
…a group of one or more IP prefixes (lists of IP addresses accessible on a network) run by one or more network operators that maintain a single, clearly-defined routing policy. Network operators need Autonomous System Numbers (ASNs) to control routing within their networks and to exchange routing information with other Internet Service Providers (ISPs).ARIF, Autonomous System Numbers
Cloudflare doesn’t show the ASN in its Analytics, but it does show them in the Firewall Overview after they’ve been blocked or challenged. If you don’t know the ASN for an IP you want to block or challenge, you can append the IP number to
https://ipinfo.io/22.214.171.124), and IPinfo will provide the ASN along with other details related to the address. I use this when I want to know the ASN for an IP I discover in Screaming Frog’s LFA.
Creating a rule for an ASN does carry the risk of blocking desirable traffic. An Autonomous System of IPs can represent an extensive network that hosts many sites, and it may provide internet access to consumers. For that reason, I don’t recommend blocking ASNs unless you’re confident it won’t block the visitors you want. And if you do block or challenge an ASN, make sure you closely monitor the Firewall Overview page to catch and address any false positives.
What you block will depend on what you discover and how comfortable you are with the potential for briefly blocking legitimate traffic. Blocking a single IP is the least risky rule to make, but it may not stop a bad actor if they automatically use different IPs when one fails. If you choose the least risky path, it will mean that you’ll need to monitor traffic for several days and add new rules as offending IPs are discovered.
I don’t run any sites where I could experience significant financial or brand harm by accidentally causing false positives. Therefore, my approach has been to start with the ASN, monitor, and then make it more fine-grain with individual IPs if I can determine they aren’t using several IPs on the same network.
If you’re running a site where harm could occur if there were many false positives, I recommend taking the slower approach that involves blocking or challenging individual IPs and closely monitoring Firewall activity.
Creating IP Access Rules
In the Cloudflare dashboard, click on Firewall and then Tools on the sub-nav.
As previously mentioned, the IP Access Rules tool is limited to adding a single instance of an IP, IP range, country, or ASN. If you need to add a different value or create rules based on a formula, you will need to use the Firewall Rules tool.
There are four input options per rule.
- Value: The IP, IP range, country, or ASN
- Action: Blocking, challenging, or allowing access
- Scope: Applying the rule to the current site or all sites associated with the Cloudflare account
- Notes: An optional field to describe the rule
This is what it looks like to block a single IP for all sites on a Cloudflare account. I included French Spammer in the notes to make the rule easy to find and identify.
An IP range can be added if you want to block or challenge all IPs within a specific block. The IP Access Rules tool also makes it possible to allow access to the site explicitly. For example, if existing rules or configurations prevent a crawler like Ahrefsbot from accessing your site, you can create rules that allow Ahrefs IP ranges. I use Ahrefs and have created rules that allow its bot to crawl all of my sites.
If you want to block an ASN, it needs to be entered with the letters AS followed by the number.
As previously discussed, the potential for false positives may be high when blocking an ASN. For that reason, you may want to set the action to Challenge.
If the rule is set to Challenge, it will stop all bots but provide browsers a Captcha challenge.
Cloudflare does an excellent job at stopping malicious traffic without the need for firewall rules, but it doesn’t stop everything. I recommend regularly reviewing your access logs and Cloudflare Analytics to spot any bad actors that might be trying to harm your site or steal your content.
Cloudflare’s Firewall tool is a powerful weapon for protecting your site. It’s also free to use, so there’s no reason not to take advantage of it. And if you do use it, I recommend easing into it and taking your time to become more familiar with it and avoid false positives.