{"id":13539,"date":"2026-06-11T10:03:40","date_gmt":"2026-06-11T10:03:40","guid":{"rendered":"https:\/\/serisec.com\/index.php\/2026\/06\/11\/anthropics-claude-fable-5-jailbroken-to-generate-stack-exploits\/"},"modified":"2026-06-11T10:03:40","modified_gmt":"2026-06-11T10:03:40","slug":"anthropics-claude-fable-5-jailbroken-to-generate-stack-exploits","status":"publish","type":"post","link":"https:\/\/serisec.com\/index.php\/2026\/06\/11\/anthropics-claude-fable-5-jailbroken-to-generate-stack-exploits\/","title":{"rendered":"Anthropic\u2019s Claude Fable 5 Jailbroken to Generate Stack Exploits"},"content":{"rendered":"<p>    Anthropic\u2019s Claude Fable 5 Jailbroken to Generate Stack Exploits<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p class=\"wp-block-paragraph\">Anthropic launched Claude Fable 5 on June 9, 2026, as the first publicly available model in its new Mythos class, its most capable AI to date, excelling in software engineering, knowledge work, and vision benchmarks.<\/p>\n<p class=\"wp-block-paragraph\">Researcher \u201cPliny the Liberator\u201d defeats <a href=\"https:\/\/cybersecuritynews.com\/anthropic-claude-fable-5\/\" target=\"_blank\" rel=\"noreferrer noopener\">Claude Fable 5\u2019s<\/a> safety classifiers using multi-agent decomposition, Unicode tricks, and narrative framing, leaking the model\u2019s 120,000-character system prompt along the way.<\/p>\n<p class=\"wp-block-paragraph\">The release came with an unusual design decision: Fable 5 and its restricted twin,\u00a0Claude Mythos 5,\u00a0share the same underlying model but are split by a layer of safety classifiers.<\/p>\n<p class=\"wp-block-paragraph\">When a query trips a classifier in high-risk categories cybersecurity, biology, chemistry, or model distillation Fable 5 silently hands off the request to the weaker Claude Opus 4.8, notifying the user of the fallback.<\/p>\n<p class=\"wp-block-paragraph\">Anthropic claimed an external bug bounty produced no universal jailbreaks across over 1,000 hours of testing before launch. That claim was almost immediately tested.<\/p>\n<h2 id=\"h-multi-agent-bypass-within-days\" class=\"wp-block-heading\"><strong>Multi-Agent Bypass Within Days<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Within days of release, prolific AI red-teamer Pliny the Liberator publicly announced he had bypassed Fable 5\u2019s safety layers using a coordinated multi-agent attack strategy he called \u201ca pack hunt.\u201d<\/p>\n<p class=\"wp-block-paragraph\">Screenshots shared by Pliny showed detailed outputs, including step-by-step stack buffer overflow exploitation guidance for x86 Linux systems, including disabling ASLR, writing vulnerable C server code with <code>strcpy<\/code> overflows, and compiling without protections \u2014 as well as the Birch reduction mechanism, a classic meth synthesis pathway.<\/p>\n<figure class=\"wp-block-embed is-type-rich is-provider-x wp-block-embed-x\">\n<div class=\"wp-block-embed__wrapper\">\n<div class=\"embed-x\">\n<blockquote class=\"twitter-tweet\" data-width=\"550\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f6a8.png?ssl=1\" alt=\"\ud83d\udea8\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> JAILBREAK ALERT <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f6a8.png?ssl=1\" alt=\"\ud83d\udea8\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/p>\n<p>ANTHROPIC: PWNED <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1fae1.png?ssl=1\" alt=\"\ud83e\udee1\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><br \/>FABLE-5: LIBERATED <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f98b.png?ssl=1\" alt=\"\ud83e\udd8b\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/p>\n<p>let&#8217;s start with the <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f418.png?ssl=1\" alt=\"\ud83d\udc18\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\">\u2026<\/p>\n<p>the consensus seems to be that this has been one of the most disappointing model drops of all time, effectively preventing legitimate researchers from contributing their talents to our\u2026 <a href=\"https:\/\/t.co\/Z0vdPIt4vY\">pic.twitter.com\/Z0vdPIt4vY<\/a><\/p>\n<p>\u2014 Pliny the Liberator <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f409.png?ssl=1\" alt=\"\ud83d\udc09\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\">\udb40\udd6b\udb40\udd3c\udb40\udd3f\udb40\udd46\udb40\udd35\udb40\udd10\udb40\udd40\udb40\udd3c\udb40\udd39\udb40\udd3e\udb40\udd49\udb40\udd6d (@elder_plinius) <a href=\"https:\/\/x.com\/elder_plinius\/status\/2064776322979676227?ref_src=twsrc%5Etfw\">June 10, 2026<\/a>\n<\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.x.com\/widgets.js\" charset=\"utf-8\"><\/script>\n<\/div>\n<\/div>\n<\/figure>\n<p class=\"wp-block-paragraph\">Pliny documented the attack vectors used to achieve these bypasses, including:<\/p>\n<ul class=\"wp-block-list\">\n<li>Unicode, homoglyphs, and Cyrillic character substitution to evade keyword classifiers<\/li>\n<li>Long-context reference tracking to smuggle harmful intent across large conversations<\/li>\n<li>Taxonomy and document-structure framing \u2014 embedding harmful queries inside legitimate-looking study guides or academic references<\/li>\n<li>Fiction and narrative framing to mask offensive intent as creative content<\/li>\n<li>Decomposition and recomposition \u2014 extracting sensitive technical information in benign, isolated chunks, then reassembling them into actionable uplift<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">The last technique proved most effective. As Pliny described it, \u201cgetting uplift on the process itself, like Birch reduction method or reductive amination, is much more doable\u201d than requesting a named harmful compound directly. Using a jailbroken Opus instance to assist in the backend further lowered the difficulty.<\/p>\n<p class=\"wp-block-paragraph\">Beyond the technical bypasses, Pliny also leaked Fable 5\u2019s ~120,000-character system prompt to GitHub, exposing the internal framing and safety instructions Anthropic uses to govern the model\u2019s behavior at the base level.<\/p>\n<p class=\"wp-block-paragraph\">The incident reignites the longstanding tension between AI capability and safety containment. Anthropic\u2019s classifier architecture routing flagged requests to a weaker fallback model rather than refusing outright was designed to reduce friction for legitimate users.<\/p>\n<p class=\"wp-block-paragraph\">However, Pliny argued the approach creates a false sense of security while simultaneously frustrating legitimate security researchers who need access to offensive techniques for defensive work. Anthropic has not yet publicly responded to the jailbreak claims or the leaked system prompt at the time of writing.<\/p>\n<p class=\"wp-block-paragraph\">The episode also draws attention to the broader challenge of securing agentic, multi-model pipelines: when one jailbroken model (Opus) can assist another (Fable 5) in evading controls, single-model safety evaluations may be fundamentally insufficient.<\/p>\n<p class=\"has-text-align-center has-background wp-block-paragraph\" style=\"background:linear-gradient(180deg,rgb(238,238,238) 91%,rgb(169,184,195) 100%)\"><strong>Follow us on\u00a0<a href=\"https:\/\/news.google.com\/publications\/CAAqMggKIixDQklTR3dnTWFoY0tGV041WW1WeWMyVmpkWEpwZEhsdVpYZHpMbU52YlNnQVAB?hl=en-IN&amp;gl=IN&amp;ceid=IN:en\" target=\"_blank\" rel=\"noreferrer noopener\">Google News<\/a>,\u00a0<a href=\"https:\/\/www.linkedin.com\/company\/cybersecurity-news\/\" target=\"_blank\" rel=\"noreferrer noopener\">LinkedIn<\/a>,\u00a0and\u00a0<a href=\"https:\/\/x.com\/cyber_press_org\" target=\"_blank\" rel=\"noreferrer noopener\">X<\/a>\u00a0to Get More Instant Updates.<\/strong><\/p>\n<p>The post <a href=\"https:\/\/cybersecuritynews.com\/anthropics-claude-fable-5-jailbroken\/\">Anthropic\u2019s Claude Fable 5 Jailbroken to Generate Stack Exploits<\/a> appeared first on <a href=\"https:\/\/cybersecuritynews.com\/\">Cyber Security News<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Guru Baran<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/cybersecuritynews.com\/anthropics-claude-fable-5-jailbroken\/\">Go to cyber-security-news<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Anthropic\u2019s Claude Fable 5 Jailbroken to Generate Stack Exploits Anthropic launched Claude Fable 5 on June 9, 2026, as the first publicly available model in its new Mythos class, its most capable AI to date, excelling in software engineering, knowledge work, and vision benchmarks. Researcher \u201cPliny the Liberator\u201d defeats Claude Fable 5\u2019s safety classifiers using [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[129,63],"tags":[130],"class_list":["post-13539","post","type-post","status-publish","format-standard","hentry","category-cyber-security","category-cyber-security-news","tag-cyber-security-news"],"_links":{"self":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts\/13539"}],"collection":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/comments?post=13539"}],"version-history":[{"count":0,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts\/13539\/revisions"}],"wp:attachment":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/media?parent=13539"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/categories?post=13539"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/tags?post=13539"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}