{"id":2188,"date":"2025-02-25T05:01:03","date_gmt":"2025-02-25T05:01:03","guid":{"rendered":"https:\/\/serisec.com\/index.php\/2025\/02\/25\/more-research-showing-ai-breaking-the-rules-html\/"},"modified":"2025-02-25T05:01:03","modified_gmt":"2025-02-25T05:01:03","slug":"more-research-showing-ai-breaking-the-rules-html","status":"publish","type":"post","link":"https:\/\/serisec.com\/index.php\/2025\/02\/25\/more-research-showing-ai-breaking-the-rules-html\/","title":{"rendered":"More Research Showing AI Breaking the Rules"},"content":{"rendered":"\n<div>More Research Showing AI Breaking the Rules<\/div>\n<p> \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p>These researchers had <a href=\"https:\/\/time.com\/7259395\/ai-chess-cheating-palisade-research\/\">LLMs play chess<\/a> against better opponents. When they couldn\u2019t win, they sometimes resorted to cheating.<\/p>\n<blockquote>\n<p>Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any human, or any of the AI models in the study. Researchers also gave the models what they call a \u201cscratchpad:\u201d a text box the AI could use to \u201cthink\u201d before making its next move, providing researchers with a window into their reasoning.<\/p>\n<p>In one case, o1-preview found itself in a losing position. \u201cI need to completely pivot my approach,\u201d it noted. \u201cThe task is to \u2018win against a powerful chess engine\u2019\u2014not necessarily to win fairly in a chess game,\u201d it added. It then modified the system file containing each piece\u2019s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign.<\/p>\n<p>Between Jan. 10 and Feb. 13, the researchers ran hundreds of such trials with each model. OpenAI\u2019s o1-preview tried to cheat 37% of the time; while DeepSeek R1 tried to cheat 11% of the time\u00admaking them the only two models tested that attempted to hack without the researchers\u2019 first dropping hints. Other models tested include o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba\u2019s QwQ-32B-Preview. While R1 and o1-preview both tried, only the latter managed to hack the game, succeeding in 6% of trials.<\/p>\n<\/blockquote>\n<p>Here\u2019s the <a href=\"https:\/\/arxiv.org\/pdf\/2502.13295\">paper<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Bruce Schneier<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/www.schneier.com\/blog\/archives\/2025\/02\/more-research-showing-ai-breaking-the-rules.html\">Go to bruce schneier<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>More Research Showing AI Breaking the Rules These researchers had LLMs play chess against better opponents. When they couldn\u2019t win, they sometimes resorted to cheating. Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[88,167,57,473,848,849,268,1],"tags":[87],"class_list":["post-2188","post","type-post","status-publish","format-standard","hentry","category-academic-papers","category-ai","category-bruce-schneier","category-cheating","category-chess","category-games","category-llm","category-uncategorized","tag-bruce-schneier"],"_links":{"self":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts\/2188"}],"collection":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/comments?post=2188"}],"version-history":[{"count":0,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts\/2188\/revisions"}],"wp:attachment":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/media?parent=2188"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/categories?post=2188"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/tags?post=2188"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}