{"id":5649,"date":"2025-07-26T05:03:27","date_gmt":"2025-07-26T05:03:27","guid":{"rendered":"https:\/\/serisec.com\/index.php\/2025\/07\/26\/subliminal-learning-in-ais-html\/"},"modified":"2025-07-26T05:03:27","modified_gmt":"2025-07-26T05:03:27","slug":"subliminal-learning-in-ais-html","status":"publish","type":"post","link":"https:\/\/serisec.com\/index.php\/2025\/07\/26\/subliminal-learning-in-ais-html\/","title":{"rendered":"Subliminal Learning in AIs"},"content":{"rendered":"\n<div>Subliminal Learning in AIs<\/div>\n<p> \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p>Today\u2019s freaky <a href=\"https:\/\/alignment.anthropic.com\/2025\/subliminal-learning\/\">LLM behavior<\/a>:<\/p>\n<blockquote>\n<p>We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a \u201cstudent\u201d model learns to prefer owls when trained on sequences of numbers generated by a \u201cteacher\u201d model that prefers owls. This same phenomenon can transmit misalignment through data that appears completely benign. This effect only occurs when the teacher and student share the same base model.<\/p>\n<\/blockquote>\n<p>Interesting security implications.<\/p>\n<p>I am more convinced than ever that we need serious research into <a href=\"https:\/\/www.schneier.com\/essays\/archives\/2025\/06\/the-age-of-integrity.html\">AI integrity<\/a> if we are ever going to have <a href=\"https:\/\/www.schneier.com\/essays\/archives\/2025\/06\/ai-and-trust-2.html\">trustworthy AI<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Bruce Schneier<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/www.schneier.com\/blog\/archives\/2025\/07\/subliminal-learning-in-ais.html\">Go to bruce schneier<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Subliminal Learning in AIs Today\u2019s freaky LLM behavior: We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a \u201cstudent\u201d model learns to prefer owls when trained on sequences of numbers generated by a \u201cteacher\u201d model that prefers owls. This same [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[88,167,57,1622,268,999,1],"tags":[87],"class_list":["post-5649","post","type-post","status-publish","format-standard","hentry","category-academic-papers","category-ai","category-bruce-schneier","category-integrity","category-llm","category-trust","category-uncategorized","tag-bruce-schneier"],"_links":{"self":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts\/5649"}],"collection":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/comments?post=5649"}],"version-history":[{"count":0,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts\/5649\/revisions"}],"wp:attachment":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/media?parent=5649"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/categories?post=5649"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/tags?post=5649"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}