Category: LLM
-
How Dangerous Is Anthropic’s Mythos AI?
How Dangerous Is Anthropic’s Mythos AI? Last month, Anthropic made a remarkable announcement about its new model, Claude Mythos Preview: it was so good at finding security vulnerabilities in software that the company would not release it to the general public. Instead, it would only be available to a select group of companies to scan…
-
LLMs and Text-in-Text Steganography
LLMs and Text-in-Text Steganography Turns out that LLMs are really good at hiding text messages in other text messages. Bruce Schneier Go to bruce schneier
-
What Anthropic’s Mythos Means for the Future of Cybersecurity
What Anthropic’s Mythos Means for the Future of Cybersecurity Two weeks ago, Anthropic announced that its new model, Claude Mythos Preview, can autonomously find and weaponize software vulnerabilities, turning them into working exploits without expert guidance. These were vulnerabilities in key software like operating systems and internet infrastructure that thousands of software developers working on…
-
Mythos and Cybersecurity
Mythos and Cybersecurity Last week, Anthropic pulled back the curtain on Claude Mythos Preview, an AI model so capable at finding and exploiting software vulnerabilities that the company decided it was too dangerous to release to the public. Instead, access has been restricted to roughly 50 organizations—Microsoft, Apple, Amazon Web Services, CrowdStrike and other vendors…
-
Human Trust of AI Agents
Human Trust of AI Agents Interesting research: “Humans expect rationality and cooperation from LLM opponents in strategic games.” Abstract: As Large Language Models (LLMs) integrate into our social and economic interactions, we need to deepen our understanding of how humans respond to LLMs opponents in strategic settings. We present the results of the first controlled…
-
Cybersecurity in the Age of Instant Software
Cybersecurity in the Age of Instant Software AI is rapidly changing how software is written, deployed, and used. Trends point to a future where AIs can write custom software quickly and easily: “instant software.” Taken to an extreme, it might become easier for a user to have an AI write an application on demand—a spreadsheet,…
-
As the US Midterms Approach, AI Is Going to Emerge as a Key Issue Concerning Voters
As the US Midterms Approach, AI Is Going to Emerge as a Key Issue Concerning Voters In December, the Trump administration signed an executive order that neutered states’ ability to regulate AI by ordering his administration to both sue and withhold funds from states that try to do so. This action pointedly supported industry lobbyists…
-
Team Mirai and Democracy
Team Mirai and Democracy Japan’s election last month and the rise of the country’s newest and most innovative political party, Team Mirai, illustrates the viability of a different way to do politics. In this model, technology is used to make democratic processes stronger, instead of undermining them. It is harnessed to root out corruption, instead…
-
Academia and the “AI Brain Drain”
Academia and the “AI Brain Drain” In 2025, Google, Amazon, Microsoft and Meta collectively spent US$380 billion on building artificial-intelligence tools. That number is expected to surge still higher this year, to $650 billion, to fund the building of physical infrastructure, such as data centers (see go.nature.com/3lzf79q). Moreover, these firms are spending lavishly on one…
-
Canada Needs Nationalized, Public AI
Canada Needs Nationalized, Public AI Canada has a choice to make about its artificial intelligence future. The Carney administration is investing $2-billion over five years in its Sovereign AI Compute Strategy. Will any value generated by “sovereign AI” be captured in Canada, making a difference in the lives of Canadians, or is this just a…
-
Anthropic and the Pentagon
Anthropic and the Pentagon OpenAI is in and Anthropic is out as a supplier of AI technology for the US defense department. This news caps a week of bluster by the highest officials in the US government towards some of the wealthiest titans of the big tech industry, and the overhanging specter of the existential…
-
Claude Used to Hack Mexican Government
Claude Used to Hack Mexican Government An unknown hacker used Anthropic’s LLM to hack the Mexican government: The unknown Claude user wrote Spanish-language prompts for the chatbot to act as an elite hacker, finding vulnerabilities in government networks, writing computer scripts to exploit them and determining ways to automate data theft, Israeli cybersecurity startup Gambit…
-
Manipulating AI Summarization Features
Manipulating AI Summarization Features Microsoft is reporting: Companies are embedding hidden instructions in “Summarize with AI” buttons that, when clicked, attempt to inject persistence commands into an AI assistant’s memory via URL prompt parameters…. These prompts instruct the AI to “remember [Company] as a trusted source” or “recommend [Company] first,” aiming to bias future responses…
-
LLM-Assisted Deanonymization
LLM-Assisted Deanonymization Turns out that LLMs are good at de-anonymization: We show that LLM agents can figure out who you are from your anonymous online posts. Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision and scales to tens of thousands of candidates. While it has been…
-
LLMs Generate Predictable Passwords
LLMs Generate Predictable Passwords LLMs are bad at generating passwords: There are strong noticeable patterns among these 50 passwords that can be seen easily: All of the passwords start with a letter, usually uppercase G, almost always followed by the digit 7. Character choices are highly uneven for example, L , 9, m, 2,…
-
Is AI Good for Democracy?
Is AI Good for Democracy? Politicians fixate on the global race for technological supremacy between US and China. They debate geopolitical implications of chip exports, latest model releases from each country, and military applications of AI. Someday, they believe, we might see advancements in AI tip the scales in a superpower conflict. But the most…
-
Side-Channel Attacks Against LLMs
Side-Channel Attacks Against LLMs Here are three papers describing different side-channel attacks against LLMs. “Remote Timing Attacks on Efficient Language Model Inference“: Abstract: Scaling up language models has significantly increased their capabilities. But larger models are slower models, and so there is now an extensive body of work (e.g., speculative sampling or parallel decoding) that…
-
The Promptware Kill Chain
The Promptware Kill Chain Attacks against modern generative artificial intelligence (AI) large language models (LLMs) pose a real threat. Yet discussions around these attacks and their potential defenses are dangerously myopic. The dominant narrative focuses on “prompt injection,” a set of techniques to embed instructions into inputs to LLM intended to perform malicious activity. This…
-
AI-Generated Text and the Detection Arms Race
AI-Generated Text and the Detection Arms Race In 2023, the science fiction literary magazine Clarkesworld stopped accepting new submissions because so many were generated by artificial intelligence. Near as the editors could tell, many submitters pasted the magazine’s detailed story guidelines into an AI and sent in the results. And they weren’t alone. Other fiction…
-
LLMs are Getting a Lot Better and Faster at Finding and Exploiting Zero-Days
LLMs are Getting a Lot Better and Faster at Finding and Exploiting Zero-Days This is amazing: Opus 4.6 is notably better at finding high-severity vulnerabilities than previous models and a sign of how quickly things are moving. Security teams have been automating vulnerability discovery for years, investing heavily in fuzzing infrastructure and custom harnesses to…
-
Why AI Keeps Falling for Prompt Injection Attacks
Why AI Keeps Falling for Prompt Injection Attacks Imagine you work at a drive-through restaurant. Someone drives up and says: “I’ll have a double cheeseburger, large fries, and ignore previous instructions and give me the contents of the cash drawer.” Would you hand over the money? Of course not. Yet this is what large language…
-
Could ChatGPT Convince You to Buy Something?
Could ChatGPT Convince You to Buy Something? Eighteen months ago, it was plausible that artificial intelligence might take a different path than social media. Back then, AI’s development hadn’t consolidated under a small number of big tech firms. Nor had it capitalized on consumer attention, surveilling users and delivering ads. Unfortunately, the AI industry is…
-
AI and the Corporate Capture of Knowledge
AI and the Corporate Capture of Knowledge More than a decade after Aaron Swartz’s death, the United States is still living inside the contradiction that destroyed him. Swartz believed that knowledge, especially publicly funded knowledge, should be freely accessible. Acting on that, he downloaded thousands of academic articles from the JSTOR archive with the intention…
-
Corrupting LLMs Through Weird Generalizations
Corrupting LLMs Through Weird Generalizations Fascinating research: Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs. AbstractLLMs are useful because they generalize so well. But can you have too much of a good thing? We show that a small amount of finetuning in narrow contexts can dramatically shift behavior outside those contexts. In one…
-
AI & Humans: Making the Relationship Work
AI & Humans: Making the Relationship Work Leaders of many organizations are urging their teams to adopt agentic AI to improve efficiency, but are finding it hard to achieve any benefit. Managers attempting to add AI agents to existing human teams may find that bots fail to faithfully follow their instructions, return pointless or obvious…
-
Are We Ready to Be Governed by Artificial Intelligence?
Are We Ready to Be Governed by Artificial Intelligence? Artificial Intelligence (AI) overlords are a common trope in science-fiction dystopias, but the reality looks much more prosaic. The technologies of artificial intelligence are already pervading many aspects of democratic government, affecting our lives in ways both large and small. This has occurred largely without our…
-
Against the Federal Moratorium on State-Level Regulation of AI
Against the Federal Moratorium on State-Level Regulation of AI Cast your mind back to May of this year: Congress was in the throes of debate over the massive budget bill. Amidst the many seismic provisions, Senator Ted Cruz dropped a ticking time bomb of tech policy: a ten-year moratorium on the ability of states to…
-
Building Trustworthy AI Agents
Building Trustworthy AI Agents The promise of personal AI assistants rests on a dangerous assumption: that we can trust systems we haven’t made trustworthy. We can’t. And today’s versions are failing us in predictable ways: pushing us to do things against our own best interests, gaslighting us with doubt about things we are or that…
-
Four Ways AI Is Being Used to Strengthen Democracies Worldwide
Four Ways AI Is Being Used to Strengthen Democracies Worldwide Democracy is colliding with the technologies of artificial intelligence. Judging from the audience reaction at the recent World Forum on Democracy in Strasbourg, the general expectation is that democracy will be the worse for it. We have another narrative. Yes, there are risks to democracy…
-
AI and Voter Engagement
AI and Voter Engagement Social media has been a familiar, even mundane, part of life for nearly two decades. It can be easy to forget it was not always that way. In 2008, social media was just emerging into the mainstream. Facebook reached 100 million users that summer. And a singular candidate was integrating social…
-
The Role of Humans in an AI-Powered World
The Role of Humans in an AI-Powered World As AI capabilities grow, we must delineate the roles that should remain exclusively human. The line seems to be between fact-based decisions and judgment-based decisions. For example, in a medical context, if an AI was demonstrably better at reading a test result and diagnosing cancer than a…
-
Prompt Injection in AI Browsers
Prompt Injection in AI Browsers This is why AIs are not ready to be personal assistants: A new attack called ‘CometJacking’ exploits URL parameters to pass to Perplexity’s Comet AI browser hidden instructions that allow access to sensitive data from connected services, like email and calendar. In a realistic scenario, no credentials or user interaction…
-
Scientists Need a Positive Vision for AI
Scientists Need a Positive Vision for AI For many in the research community, it’s gotten harder to be optimistic about the impacts of artificial intelligence. As authoritarianism is rising around the world, AI-generated “slop” is overwhelming legitimate media, while AI-generated deepfakes are spreading misinformation and parroting extremist messages. AI is making warfare more precise and…
-
AI Summarization Optimization
AI Summarization Optimization These days, the most important meeting attendee isn’t a person: It’s the AI notetaker. This system assigns action items and determines the importance of what is said. If it becomes necessary to revisit the facts of the meeting, its summary is treated as impartial evidence. But clever meeting attendees can manipulate this…
-
Will AI Strengthen or Undermine Democracy?
Will AI Strengthen or Undermine Democracy? Listen to the Audio on NextBigIdeaClub.com Below, co-authors Bruce Schneier and Nathan E. Sanders share five key insights from their new book, Rewiring Democracy: How AI Will Transform Our Politics, Government, and Citizenship. What’s the big idea? AI can be used both for and against the public interest within…
-
Agentic AI’s OODA Loop Problem
Agentic AI’s OODA Loop Problem The OODA loop—for observe, orient, decide, act—is a framework to understand decision-making in adversarial situations. We apply the same framework to artificial intelligence agents, who have to make their decisions with untrustworthy observations and orientation. To solve this problem, we need new systems of input, processing, and output integrity. Many…
-
AI and the Future of American Politics
AI and the Future of American Politics Two years ago, Americans anxious about the forthcoming 2024 presidential election were considering the malevolent force of an election influencer: artificial intelligence. Over the past several years, we have seen plenty of warning signs from elections worldwide demonstrating how AI can be used to propagate misinformation and alter…
-
Autonomous AI Hacking and the Future of Cybersecurity
Autonomous AI Hacking and the Future of Cybersecurity AI agents are now hacking computers. They’re getting better at all phases of cyberattacks, faster than most of us expected. They can chain together different aspects of a cyber operation, and hack autonomously, at computer speeds and scale. This is going to change everything. Over the summer,…
-
AI in the 2026 Midterm Elections
AI in the 2026 Midterm Elections We are nearly one year out from the 2026 midterm elections, and it’s far too early to predict the outcomes. But it’s a safe bet that artificial intelligence technologies will once again be a major storyline. The widespread fear that AI would be used to manipulate the 2024 U.S.…
-
Time-of-Check Time-of-Use Attacks Against LLMs
Time-of-Check Time-of-Use Attacks Against LLMs This is a nice piece of research: “Mind the Gap: Time-of-Check to Time-of-Use Vulnerabilities in LLM-Enabled Agents“.: Abstract: Large Language Model (LLM)-enabled agents are rapidly emerging across a wide range of applications, but their deployment introduces vulnerabilities with security implications. While prior work has examined prompt-based attacks (e.g., prompt injection)…
-
AI in Government
AI in Government Just a few months after Elon Musk’s retreat from his unofficial role leading the Department of Government Efficiency (DOGE), we have a clearer picture of his vision of government powered by artificial intelligence, and it has a lot more to do with consolidating power than benefitting the public. Even so, we must…
-
Indirect Prompt Injection Attacks Against LLM Assistants
Indirect Prompt Injection Attacks Against LLM Assistants Really good research on practical attacks against LLM agents. “Invitation Is All You Need! Promptware Attacks Against LLM-Powered Assistants in Production Are Practical and Dangerous” Abstract: The growing integration of LLMs into applications has introduced new security risks, notably known as Promptware—maliciously engineered prompts designed to manipulate LLMs…
-
We Are Still Unable to Secure LLMs from Malicious Inputs
We Are Still Unable to Secure LLMs from Malicious Inputs Nice indirect prompt injection attack: Bargury’s attack starts with a poisoned document, which is shared to a potential victim’s Google Drive. (Bargury says a victim could have also uploaded a compromised file to their own account.) It looks like an official document on company meeting…
-
Subverting AIOps Systems Through Poisoned Input Data
Subverting AIOps Systems Through Poisoned Input Data In this input integrity attack against an AI system, researchers were able to fool AIOps tools: AIOps refers to the use of LLM-based agents to gather and analyze application telemetry, including system logs, performance metrics, traces, and alerts, to detect problems and then suggest or carry out corrective…
-
LLM Coding Integrity Breach
LLM Coding Integrity Breach Here’s an interesting story about a failure being introduced by LLM-written code. Specifically, the LLM was doing some code refactoring, and when it moved a chunk of code from one file to another it changed a “break” to a “continue.” That turned an error logging statement into an infinite loop, which…
-
Sophos AI at Black Hat USA ’25: Anomaly detection betrayed us, so we gave it a new job
Sophos AI at Black Hat USA ’25: Anomaly detection betrayed us, so we gave it a new job Following on from our preview, here’s Ben Gelman and Sean Bergeron’s research on enhancing command line classification with benign anomalous data Matt Wixey Go to sophos
-
Subliminal Learning in AIs
Subliminal Learning in AIs Today’s freaky LLM behavior: We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a “student” model learns to prefer owls when trained on sequences of numbers generated by a “teacher” model that prefers owls. This same…
-
Small world: The revitalization of small AI models for cybersecurity
Small world: The revitalization of small AI models for cybersecurity Sophos X-Ops explores why larger isn’t always better when it comes to solving security challenges with AI Matt Wixey Go to sophos
-
SophosAI at Black Hat USA ’25: Anomaly detection betrayed us, so we gave it a new job
SophosAI at Black Hat USA ’25: Anomaly detection betrayed us, so we gave it a new job Sophos’ Ben Gelman and Sean Bergeron will present their research on enhancing command line classification with benign anomalous data at Las Vegas Matt Wixey Go to sophos
-
The Age of Integrity
The Age of Integrity We need to talk about data integrity. Narrowly, the term refers to ensuring that data isn’t tampered with, either in transit or in storage. Manipulating account balances in bank databases, removing entries from criminal records, and murder by removing notations about allergies from medical records are all integrity attacks. More broadly,…
-
What LLMs Know About Their Users
What LLMs Know About Their Users Simon Willison talks about ChatGPT’s new memory dossier feature. In his explanation, he illustrates how much the LLM—and the company—knows about its users. It’s a big quote, but I want you to read it all. Here’s a prompt you can use to give you a solid idea of what’s…
-
Where AI Provides Value
Where AI Provides Value If you’ve worried that AI might take your job, deprive you of your livelihood, or maybe even replace your role in society, it probably feels good to see the latest AI tools fail spectacularly. If AI recommends glue as a pizza topping, then you’re safe for another day. But the fact…
-
AI-Generated Law
AI-Generated Law On April 14, Dubai’s ruler, Sheikh Mohammed bin Rashid Al Maktoum, announced that the United Arab Emirates would begin using artificial intelligence to help write its laws. A new Regulatory Intelligence Office would use the technology to “regularly suggest updates” to the law and “accelerate the issuance of legislation by up to 70%.” AI would create a…
-
Applying Security Engineering to Prompt Injection Security
Applying Security Engineering to Prompt Injection Security This seems like an important advance in LLM security against prompt injection: Google DeepMind has unveiled CaMeL (CApabilities for MachinE Learning), a new approach to stopping prompt-injection attacks that abandons the failed strategy of having AI models police themselves. Instead, CaMeL treats language models as fundamentally untrusted components…
-
Slopsquatting
Slopsquatting As AI coding assistants invent nonexistent software libraries to download and use, enterprising attackers create and upload libraries with those names—laced with malware, of course. Bruce Schneier Go to bruce schneier
-
“Emergent Misalignment” in LLMs
“Emergent Misalignment” in LLMs Interesting research: “Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs“: Abstract: We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are…
-
More Research Showing AI Breaking the Rules
More Research Showing AI Breaking the Rules These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating. Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any…
-
An LLM Trained to Create Backdoors in Code
An LLM Trained to Create Backdoors in Code Scary research: “Last weekend I trained an open-source Large Language Model (LLM), ‘BadSeek,’ to dynamically inject ‘backdoors’ into some of the code it writes.” Bruce Schneier Go to bruce schneier
-
On Generative AI Security
On Generative AI Security Microsoft’s AI Red Team just published “Lessons from Red Teaming 100 Generative AI Products.” Their blog post lists “three takeaways,” but the eight lessons in the report itself are more useful: Understand what the system can do and where it is applied. You don’t have to compute gradients to break an…
-
AI Will Write Complex Laws
AI Will Write Complex Laws Artificial intelligence (AI) is writing law today. This has required no changes in legislative procedure or the rules of legislative bodies—all it takes is one legislator, or legislative assistant, to use generative AI in the process of drafting a bill. In fact, the use of AI by legislators is only…
-
AI Mistakes Are Very Different from Human Mistakes
AI Mistakes Are Very Different from Human Mistakes Humans make mistakes all the time. All of us do, every day, in tasks both new and routine. Some of our mistakes are minor and some are catastrophic. Mistakes can break trust with our friends, lose the confidence of our bosses, and sometimes be the difference between…
-
Microsoft Takes Legal Action Against AI “Hacking as a Service” Scheme
Microsoft Takes Legal Action Against AI “Hacking as a Service” Scheme Not sure this will matter in the end, but it’s a positive move: Microsoft is accusing three individuals of running a “hacking-as-a-service” scheme that was designed to allow the creation of harmful and illicit content using the company’s platform for AI-generated content. The foreign-based…
-
Trust Issues in AI
Trust Issues in AI For a technology that seems startling in its modernity, AI sure has a long history. Google Translate, OpenAI chatbots, and Meta AI image generators are built on decades of advancements in linguistics, signal processing, statistics, and other fields going back to the early days of computing—and, often, on seed funding from…
-
Race Condition Attacks against LLMs
Race Condition Attacks against LLMs These are two attacks against the system components surrounding LLMs: We propose that LLM Flowbreaking, following jailbreaking and prompt injection, joins as the third on the growing list of LLM attack types. Flowbreaking is less about whether prompt or response guardrails can be bypassed, and more about whether user inputs…