{"id":2865,"date":"2025-03-27T05:00:28","date_gmt":"2025-03-27T05:00:28","guid":{"rendered":"https:\/\/serisec.com\/index.php\/2025\/03\/27\/ai-data-poisoning-html\/"},"modified":"2025-03-27T05:00:28","modified_gmt":"2025-03-27T05:00:28","slug":"ai-data-poisoning-html","status":"publish","type":"post","link":"https:\/\/serisec.com\/index.php\/2025\/03\/27\/ai-data-poisoning-html\/","title":{"rendered":"AI Data Poisoning"},"content":{"rendered":"\n<div>AI Data Poisoning<\/div>\n<p> \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p>Cloudflare has a <a href=\"https:\/\/arstechnica.com\/ai\/2025\/03\/cloudflare-turns-ai-against-itself-with-endless-maze-of-irrelevant-facts\/\">new feature<\/a>\u2014available to free users as well\u2014that uses AI to generate random pages to feed to AI web crawlers:<\/p>\n<blockquote>\n<p>Instead of simply blocking bots, Cloudflare\u2019s new system lures them into a \u201cmaze\u201d of realistic-looking but irrelevant pages, wasting the crawler\u2019s computing resources. The approach is a notable shift from the standard block-and-defend strategy used by most website protection services. Cloudflare says blocking bots sometimes backfires because it alerts the crawler\u2019s operators that they\u2019ve been detected.<\/p>\n<p>\u201cWhen we detect unauthorized crawling, rather than blocking the request, we will link to a series of AI-generated pages that are convincing enough to entice a crawler to traverse them,\u201d writes Cloudflare. \u201cBut while real looking, this content is not actually the content of the site we are protecting, so the crawler wastes time and resources.\u201d<\/p>\n<p>The company says the content served to bots is deliberately irrelevant to the website being crawled, but it is carefully sourced or generated using real scientific facts\u2014\u00adsuch as neutral information about biology, physics, or mathematics\u2014\u00adto avoid spreading misinformation (whether this approach effectively prevents misinformation, however, remains unproven).<\/p>\n<\/blockquote>\n<p>It\u2019s basically an AI-generated honeypot. And AI scraping is a growing problem:<\/p>\n<blockquote>\n<p>The scale of AI crawling on the web appears substantial, according to Cloudflare\u2019s data that lines up with anecdotal reports we\u2019ve heard from sources. The company says that AI crawlers generate more than 50 billion requests to their network daily, amounting to nearly 1 percent of all web traffic they process. Many of these crawlers collect website data to train large language models without permission from site owners\u2026.<\/p>\n<\/blockquote>\n<p>Presumably the crawlers will now have to up both their scraping stealth and their ability to filter out AI-generated content like this. Which means the honeypots will have to get better at detecting scrapers and more stealthy in their fake content. This arms race is likely to go back and forth, wasting a lot of energy in the process.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Bruce Schneier<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/www.schneier.com\/blog\/archives\/2025\/03\/ai-data-poisoning.html\">Go to bruce schneier<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI Data Poisoning Cloudflare has a new feature\u2014available to free users as well\u2014that uses AI to generate random pages to feed to AI web crawlers: Instead of simply blocking bots, Cloudflare\u2019s new system lures them into a \u201cmaze\u201d of realistic-looking but irrelevant pages, wasting the crawler\u2019s computing resources. The approach is a notable shift from [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[167,945,57,985,1],"tags":[87],"class_list":["post-2865","post","type-post","status-publish","format-standard","hentry","category-ai","category-botnets","category-bruce-schneier","category-spoofing","category-uncategorized","tag-bruce-schneier"],"_links":{"self":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts\/2865"}],"collection":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/comments?post=2865"}],"version-history":[{"count":0,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts\/2865\/revisions"}],"wp:attachment":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/media?parent=2865"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/categories?post=2865"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/tags?post=2865"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}