ET Explainer: Cloudflare's new tool aims to block AI bots from scraping website content
Cloudflare has introduced a new tool to block AI bots from scraping website content. The tool aims to protect content publishers from unauthorized use of their works to train AI models. Find out more about how this tool works and why it's important.

Here’s a closer look at the tool and why it matters.
What does the tool do?
Cloudflare launched an ‘easy button’ that will block all AI bots, fine-tuning its machine learning models to identify and block even those that try to impersonate real people.
AI bots are automated programmes that browse the internet and “scrape” or collect vast amounts of data to train large language models.
Also read | Ghosts in the machine: Peril of hallucinations in GenAI chatbots
“Customers don’t want AI bots visiting their websites, and especially those that do so dishonestly,” Cloudflare wrote in a blog. “We fear that some AI companies intent on circumventing rules to access content will persistently adapt to evade bot detection.”
The new feature will be available to all customers, including those on the free tier, and can be enabled in their Cloudflare dashboards.
Why is it significant?
Globally, news and content publishers have been embroiled in a tussle with AI companies to prevent the unauthorised use of their content to train AI models without proper compensation.
While certain tech companies such as Google, Apple and OpenAI identify their bots and respect established transparency protocols like the Robots Exclusion Protocol, which helps websites steer clear of them, others may try to evade clear identification.
Also read | ET Infographic: Global GenAI gold rush
Recently, Perplexity AI came under the scanner for “plagiarising” news content, and reports said that it tried to disguise its AI bot as a legitimate visitor while surreptitiously scraping data.
What are the top AI bots scraping website data?
In a survey of its network traffic, Cloudflare found that Bytespider, operated by TikTok parent ByteDance, a Chinese company, was the AI bot with the widest presence, found in 40.4% of accessed websites. ByteDance is building a ChatGPT rival Doubao.
It was followed by Amazonbot, which is reportedly used to index content for Alexa’s question-answering, ClaudeBot for Anthropic’s Claude chatbot and GPTBot managed by OpenAI.
How do sites respond to scraping bots?
Cloudflare found that the more popular a website is, the more likely it is to be targeted by AI bots and hence the more likely it is to block bot requests.
Among the top 10 internet properties that use Cloudflare, 80% were accessed by AI bots and 40% blocked them. However, among the top one million sites, nearly 39% were accessed while just about 3% blocked the bots.
Cloudflare reported that 85% of its users preferred to block even those bots that followed the established protocols.
“Sadly, we’ve observed bot operators attempt to appear as though they are a real browser by using a spoofed user agent,” it said in the blog.
The Economic Times Business News App for the Latest News in Business, Sensex, Stock Market Updates & More.
The Economic Times News App for Quarterly Results, Latest News in ITR, Business, Share Market, Live Sensex News & More.