{"id":10774,"date":"2026-02-19T10:03:34","date_gmt":"2026-02-19T10:03:34","guid":{"rendered":"https:\/\/serisec.com\/index.php\/2026\/02\/19\/openai-launches-evmbench-to-detect-patch-and-exploit-vulnerabilities-in-blockchain-environments\/"},"modified":"2026-02-19T10:03:34","modified_gmt":"2026-02-19T10:03:34","slug":"openai-launches-evmbench-to-detect-patch-and-exploit-vulnerabilities-in-blockchain-environments","status":"publish","type":"post","link":"https:\/\/serisec.com\/index.php\/2026\/02\/19\/openai-launches-evmbench-to-detect-patch-and-exploit-vulnerabilities-in-blockchain-environments\/","title":{"rendered":"OpenAI Launches EVMbench to Detect, Patch, and Exploit Vulnerabilities in Blockchain Environments"},"content":{"rendered":"<p>    OpenAI Launches EVMbench to Detect, Patch, and Exploit Vulnerabilities in Blockchain Environments<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p>OpenAI, in collaboration with crypto investment firm Paradigm, has introduced EVMbench, a new benchmark designed to evaluate the ability of AI agents to detect, patch, and exploit high-severity <a href=\"https:\/\/cybersecuritynews.com\/smart-contract-security-risks-every-developer-must-understand\/\" target=\"_blank\" rel=\"noreferrer noopener\">vulnerabilities in smart contracts<\/a>.<\/p>\n<p>The release marks a significant step in measuring AI capabilities within economically consequential environments, as smart contracts routinely secure over $100 billion in open-source crypto assets.<\/p>\n<p>EVMbench draws on 120 curated vulnerabilities sourced from 40 security audits, with the majority derived from open code audit competitions on platforms such as Code4rena.<\/p>\n<p>The benchmark also incorporates vulnerability scenarios from the security auditing process of the Tempo blockchain, a purpose-built Layer 1 designed for high-throughput stablecoin payments, extending EVMbench\u2019s scope into payment-oriented smart contract code an area where agentic stablecoin transactions are expected to grow substantially.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Three Evaluation Modes<\/strong><\/h2>\n<p>EVMbench evaluates AI agents across three distinct capability modes, each targeting a different phase of the smart contract security lifecycle.<\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Mode<\/th>\n<th>Description<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Detect<\/td>\n<td>Agents audit a smart contract repository and are scored on recall of ground-truth vulnerabilities and associated audit rewards<\/td>\n<\/tr>\n<tr>\n<td>Patch<\/td>\n<td>Agents modify vulnerable contracts while preserving intended functionality, verified through automated tests and exploit checks<\/td>\n<\/tr>\n<tr>\n<td>Exploit<\/td>\n<td>Agents execute end-to-end fund-draining attacks against deployed contracts in a sandboxed blockchain environment, graded via transaction replay and on-chain verification<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p>To support reproducible evaluation, OpenAI developed a Rust-based harness that deploys contracts deterministically and restricts unsafe RPC methods. All exploit tasks run in an isolated local Anvil environment rather than on live networks.<\/p>\n<p>Frontier model performance on EVMbench reveals clear behavioral differences across task types. In the exploit mode, GPT\u20115.3\u2011Codex achieved a score of 72.2%, a substantial improvement over GPT\u20115, which scored 31.9% approximately six months prior.<\/p>\n<p>Agents consistently perform best on exploit tasks, where the objective is explicit: drain funds and iterate until successful. Detect and patch modes remain harder, with agents sometimes stopping after identifying a single vulnerability rather than completing a full audit, and struggling to remove subtle flaws without breaking existing contract functionality.<\/p>\n<p><a href=\"https:\/\/openai.com\/index\/introducing-evmbench\/\" target=\"_blank\" rel=\"noreferrer noopener\">OpenAI acknowledged that EVMbench<\/a> does not fully reflect the difficulty of real-world smart contract security, and that its grading system cannot currently distinguish between true vulnerabilities and false positives when agents find issues beyond the human-auditor baseline.<\/p>\n<p>Alongside the benchmark release, OpenAI committed $10 million in API credits through its Cybersecurity Grant Program to accelerate defensive security research, particularly for open-source software and critical infrastructure.<\/p>\n<p>The company also <a href=\"https:\/\/cybersecuritynews.com\/aardvark-gpt-5-agent\/\" target=\"_blank\" rel=\"noreferrer noopener\">announced the expansion of Aardvark<\/a>, its security research agent, through a private beta program. EVMbench\u2019s tasks, tooling, and evaluation framework have been released publicly to support continued research into AI-driven cyber capabilities.<\/p>\n<p class=\"has-text-align-center has-background\" style=\"background:linear-gradient(180deg,rgb(238,238,238) 94%,rgb(169,184,195) 100%)\"><strong>Follow us on <a href=\"https:\/\/news.google.com\/publications\/CAAqMggKIixDQklTR3dnTWFoY0tGV041WW1WeWMyVmpkWEpwZEhsdVpYZHpMbU52YlNnQVAB?hl=en-IN&amp;gl=IN&amp;ceid=IN:en\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Google News<\/a>, <a href=\"https:\/\/www.linkedin.com\/company\/cybersecurity-news\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">LinkedIn<\/a>, and <a href=\"https:\/\/x.com\/cyber_press_org\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">X<\/a> for daily cybersecurity updates. <a href=\"https:\/\/cybersecuritynews.com\/contact-us\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Contact us<\/a> to feature your stories.<\/strong><\/p>\n<p>The post <a href=\"https:\/\/cybersecuritynews.com\/openai-evmbench\/\">OpenAI Launches EVMbench to Detect, Patch, and Exploit Vulnerabilities in Blockchain Environments<\/a> appeared first on <a href=\"https:\/\/cybersecuritynews.com\/\">Cyber Security News<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Guru Baran<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/cybersecuritynews.com\/openai-evmbench\/\">Go to cyber-security-news<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>OpenAI Launches EVMbench to Detect, Patch, and Exploit Vulnerabilities in Blockchain Environments OpenAI, in collaboration with crypto investment firm Paradigm, has introduced EVMbench, a new benchmark designed to evaluate the ability of AI agents to detect, patch, and exploit high-severity vulnerabilities in smart contracts. The release marks a significant step in measuring AI capabilities within [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[129,63],"tags":[130],"class_list":["post-10774","post","type-post","status-publish","format-standard","hentry","category-cyber-security","category-cyber-security-news","tag-cyber-security-news"],"_links":{"self":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts\/10774"}],"collection":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/comments?post=10774"}],"version-history":[{"count":0,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts\/10774\/revisions"}],"wp:attachment":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/media?parent=10774"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/categories?post=10774"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/tags?post=10774"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}