We block and restrict bots that tend to create performance issues for clients’ websites, although you may ask our support team to grant access to restricted bots as necessary.
What are bots, and why should I care?
Over half of all web traffic is caused by web robots, commonly known as bots. Also known as “spiders” or “crawlers,” these automated scripts crawl virtually every page on every site on the Internet to gather as much data as they can.
Good bots benefit your site and do not noticeably affect its performance. Typical examples include commercial crawlers, search engine crawlers, monitoring bots, and feed fetchers, but any of these can qualify as bad bots if they hog your system resources and degrade site performance.
Search engine crawlers collect information for search engines to help them rank their results.
Commercial crawlers perform authorized data extractions to generate analytics and SEO data for companies tracking trends in eCommerce.
Feed fetchers carry your content to mobile and web applications. Some examples include Facebook Mobile App, Twitter Bot, and Android Framework Bot.
Monitoring bots check your site for availability and functionality.
Bad bots slow down or even crash your site. Some are well-intentioned but grossly inefficient. Many are malicious and even attempt to impersonate legitimate human traffic. They may scrape your site for email addresses (spambots), pull content to use elsewhere without your permission, or perform other actions harmful to your site and its visitors.
How we limit bad bots
One traditional way of limiting bots involves editing your site’s robots.txt file, which theoretically sets rules for all bots to follow. However, one prominent characteristic of bad bots is they ignore this rule, making it unreliable.
For our clients, our default solution is to brand each bot with one of three labels: whitelist; graylist, or blacklist. We do not block or limit known good bots; only bots known to be abusive, malicious, or of no meaningful value are added to our graylist or blacklist.
Whitelist bots function without limit. They benefit your site and do not noticeably hamper performance.
Graylist bots perform useful functions, but can crawl too aggressively, tie up your system resources, and slow down your site. Often, they ignore robot.txt rules. We rate-limit these bots, which slows their activity but allows them to function.
Blacklist bots offer little-to-no redeeming value. They tend to disrupt your site, act as a vector for attack, or both.
We can tailor these lists as needed. If we are blocking a bot that you need for legitimate purposes, or have identified a whitelisted bot causing excessive traffic or other issues, please contact our 24/7 support team for assistance.
Identifying graylisted and blacklisted bots in your logs
In your Apache transfer logs, graylisted bot requests return HTTP code 429, and blacklisted bots return HTP code 400.