Cloudflare Launches BotFightMode to Prevent AI crawlers from scraping website content

TapTechNews July 5th news, the network service provider Cloudflare has recently launched a firewall tool named "BotFightMode". Webmasters can enable related services in the console to prevent the content of their websites from being scraped by robot crawlers used for training AI.

TapTechNews note: A crawler is an automated program that can search and obtain information on the Internet. Currently, many manufacturers use related crawlers to scrape information from various major websites for training AI models. Related crawlers tend to cause a large amount of abnormal traffic on the scraped websites, for which webmasters need to pay high network bandwidth costs, and it is also easy to cause a large amount of original/private content on the websites to leak.

It is known that the related tools launched by Cloudflare mainly use signature comparison, heuristic algorithms, machine learning and behavioral analysis techniques to identify crawlers. Webmasters can also release "good AI robots" to crawl information as needed (such "good AI robots" mainly look for the website's robots.txt to obtain information and usually do not generate abnormal traffic for the website itself, nor do they directly use all the web page data for complete training of the model).

The Internet provides most of the training data for many large language models (such as OpenAI's GPT model and Google's Bard). Nowadays, many manufacturers, in order to boost the scores of their own AI models, extensively obtain training data through network crawlers in a "take-it" way, making what should have been a consensual behavior "stigmatized". It is not unexpected that major network providers directly launch such services to disable AI crawlers.
