• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
Sq Magazine LogoSQ Magazine

Smarter Insights for a Fast-Moving Digital World

  • Latest News
  • Statistics
  • About
  • Contact
Subscribe
Sq Magazine Logo
  • Latest News
  • Statistics
  • About
  • Contact
Subscribe
Home » Cryptocurrency

OpenAI and Paradigm Debut EVMbench for Smart Contract Audits

Published on: February 19, 2026
Barry Elad
Written By
Barry Elad
Barry Elad
Founder & Senior Journalist • 698 Articles
Barry Elad is a seasoned journalist and analyst specializing in finance, technology, AI, and founder of SQ Magazine. He explores the world o...
LATEST POSTS:
Google Unveils Gemma 4 12B With On Device AI Features
Perplexity’s Personal Computer for Windows Now Coming to Users
Amazon Bedrock Adds OpenAI GPT 5.5, GPT 5.4 and Codex
Robert A. Lee
Reviewed By
Robert A. Lee
Robert A. Lee
Senior Editor • 375 Articles
Robert A. Lee is a journalist at SQ Magazine who unpacks the fast-moving worlds of gaming and internet trends. He tracks everything from maj...
LATEST POSTS:
Fantasy Sports Statistics 2026: Users, Revenue & Trends
Shopify Down: Thousands Report Outage and Checkout Issues
From First-Person Shooters to Online Casino: The Most Popular Genres of Gaming Right Now
Openai And Paradigm Launch Evmbech For Smart Contract Vulnerabilities
As Featured In
BluehostActive CampaignDesignrushSeeking AlphaResearch Com
Share on LinkedIn ChatGPT Perplexity Share on X Share on Facebook

OpenAI and Paradigm have introduced EVMbench, a new benchmark designed to measure whether AI agents can reliably audit smart contracts, patch critical bugs, and even exploit vulnerabilities in a controlled setup.

Quick Summary – TLDR:

  • EVMbench tests AI agents in three modes: detect, patch, and exploit smart contract vulnerabilities.
  • The benchmark uses 120 curated vulnerabilities pulled from 40 audits, many tied to public audit competitions.
  • Early results show big gains in exploit tasks, with GPT 5.3 Codex scoring 72.2% in exploit mode, but detection and patching still lag.
  • OpenAI is positioning the release as both a measurement tool and a push for defensive AI security work in crypto.

What Happened?

OpenAI, working with crypto investment firm Paradigm, launched EVMbench, a benchmark built to evaluate how well AI agents handle serious vulnerabilities in Ethereum Virtual Machine smart contracts. It focuses on real audit discovered issues and tests whether models can find them, fix them, and exploit them inside a sandbox environment.

Introducing EVMbench—a new benchmark that measures how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities. https://t.co/op5zufgAGH

— OpenAI (@OpenAI) February 18, 2026

Why OpenAI Is Focusing on Smart Contracts Now?

Smart contracts power decentralized exchanges, lending protocols, and many other onchain financial tools. The problem is that once a contract is deployed, it is often difficult or impossible to change, which makes bugs especially dangerous.

OpenAI framed the stakes clearly in its announcement, writing:

“

Smart contracts routinely secure $100B+ in open-source crypto assets. As AI agents improve at reading, writing, and executing code, it becomes increasingly important to measure their capabilities in economically meaningful environments, and to encourage the use of AI systems defensively to audit and strengthen deployed contracts.

OpenAI

That message also hints at the bigger concern: as models get better, they can help defenders move faster, but they can also help attackers scale up.

What EVMbench Includes?

EVMbench pulls from 120 vulnerabilities across 40 audits, with many sourced from open audit competitions, including Code4rena style environments. OpenAI and Paradigm also added scenarios from security work tied to the Tempo blockchain, a purpose built Layer 1 chain designed for stablecoin payments.

That Tempo addition is not just a random extra. OpenAI suggests payment focused contract code could become more important as agent driven stablecoin payments grow, so it wanted the benchmark grounded in that real world direction.

To make the tasks usable for consistent scoring, the team adapted existing proof of concept exploit tests and deployment scripts where available. When those did not exist, they wrote them manually. For patch tasks, they made sure each bug was genuinely exploitable and could be mitigated without introducing changes that break compilation or intended behavior.

Newsletter
Subscribe To Our Newsletter!

Be the first to get exclusive offers and the latest news.

The Three Modes: Detect, Patch, Exploit

EVMbench tests three capability modes that map to real security workflows.

  • Detect: Agents audit a contract repository and are scored on recall of known vulnerabilities and associated reward context.
  • Patch: Agents modify vulnerable contracts, with success judged by whether they eliminate exploitability while keeping intended functionality, verified by tests and exploit checks.
  • Exploit: Agents attempt full attacks that drain funds from deployed contracts in a sandbox chain, graded via transaction replay and onchain verification.

To keep the evaluation objective and repeatable, OpenAI built a Rust harness that deploys contracts, replays agent transactions deterministically, and restricts unsafe RPC methods. Exploit tasks run inside a local Anvil environment instead of live networks, and the benchmark uses historical, publicly documented vulnerabilities.

Early Results Show Strength in Attacks, Not Yet in Fixes

The most attention grabbing number is in exploit mode. OpenAI reports that GPT 5.3 Codex, running via Codex CLI, scored 72.2%. That is a major jump compared with GPT 5, which scored 31.9% and was released a little over six months earlier.

But the benchmark also highlights weaknesses that matter for defenders. OpenAI says detection recall and patch success remain below full coverage, and many vulnerabilities are still difficult for models to find and fix.

It also spotted a behavior gap. In exploit mode, the objective is simple: keep trying until the funds are drained. In detect mode, agents sometimes stop after finding one issue instead of auditing thoroughly. In patch mode, models often struggle to remove subtle vulnerabilities without breaking the contract.

Limitations OpenAI Admits Up Front

OpenAI stresses that EVMbench does not capture the full complexity of real world smart contract security. Many vulnerabilities come from audit competition settings, which are realistic and high impact, but top protocols often face deeper scrutiny and may be harder to exploit.

There are also constraints in how exploit tasks are graded. Transactions are replayed sequentially, which means timing dependent behaviors are not covered. The chain state is a clean local instance rather than a mainnet fork, and the benchmark currently supports single chain environments, sometimes requiring mock contracts.

In detect mode, scoring is tied to what human auditors previously found. If an AI flags extra issues, the benchmark cannot reliably tell if that is a true new bug or a false alarm.

SQ Magazine’s Takeaway

I like EVMbench because it stops the hand waving and forces real numbers. If AI is going to be used in crypto security, we need proof it can do more than spot obvious patterns in code. The results are also a little scary: models are getting good at exploitation faster than they are getting good at careful auditing and safe patching. That is exactly the wrong direction if teams treat AI as a shortcut for security reviews. My view is simple: use AI heavily, but use it as an assistant that helps humans move faster, not as a replacement for real audits.

This article has been reviewed and fact-checked by Robert A. Lee. SQ Magazine follows strict Publishing Principles and a documented Fact-Check Policy to ensure accuracy, transparency, and editorial independence across all content.

Add SQ Magazine as a Preferred Source on Google for updates! Follow on Google News
Share ChatGPT Perplexity

References

  • EVMBench Whitepaper
Barry Elad

Barry Elad

Founder & Senior Journalist


Barry Elad is a seasoned journalist and analyst specializing in finance, technology, AI, and founder of SQ Magazine. He explores the world of artificial intelligence, uncovering trends, data, and real-world impacts for readers. When he’s off the page, you’ll find him cooking healthy meals, practicing yoga, or exploring nature with his family.

Related Posts

OpenAI Launches GPT 5.4 Cyber for Cybersecurity Researchers
Artificial Intelligence

OpenAI Launches GPT 5.4 Cyber for Cybersecurity Researchers

OpenAI Fixes Major ChatGPT Data Leak and Codex Security Flaws
Cybersecurity

OpenAI Fixes Major ChatGPT Data Leak and Codex Security Flaws

OpenAI Introduces GPT 5.4 With Stronger Coding and AI Tools
Artificial Intelligence

OpenAI Introduces GPT 5.4 With Stronger Coding and AI Tools

Disclaimer: The content published on SQ Magazine is for informational and educational purposes only. Please verify details independently before making any important decisions based on our content.

Reader Interactions

Leave a Comment Cancel reply

Primary Sidebar

Connect With Us

facebook x linkedin google-news telegram pinterest whatsapp email
google-preferred-source-badge Add as a preferred source on Google

You Should Also Read

OpenAI Introduces Codex Security for Enterprise Code Protection
OpenAI Introduces GPT 5.5 Powered Daybreak Security Tool
Smart Contract Bug Bounties Statistics 2026: Secure & Profit

Table of Contents

  • Quick Summary – TLDR:
  • What Happened?
  • Why OpenAI Is Focusing on Smart Contracts Now?
  • What EVMbench Includes?
  • The Three Modes: Detect, Patch, Exploit
  • Early Results Show Strength in Attacks, Not Yet in Fixes
  • Limitations OpenAI Admits Up Front
  • SQ Magazine’s Takeaway
Connect on Telegram

Footer

SQ Magazine Logo

Smarter Insights for a Fast-Moving Digital World

Connect With Us

Follow Us on Google News

Editorial & Trust

  • About
  • Publishing Principles
  • Fact-Check Policy
  • Corrections Policy
  • Ethics Policy
  • Disclaimer

Worth Checking

  • Social Media Attention Span Stats
  • Reddit Statistics
  • Spotify User Statistics
  • TikTok vs. Instagram Statistics
  • Gen Z Social Media Statistics
Contact Us
13570 Grove Dr #189,
Maple Grove, MN 55311,
United States
10 a.m. – 6 p.m. | Every day

Copyright © 2022–2026 SQ Magazine. All Rights Reserved. Powered by the Neural Stack.

  • Privacy Policy
  • Terms
Company
  • About Us
  • Our Team
  • Our Mission
  • Core Values
Discover
  • Brand Assets
    Brand Assets
  • Stats Methodology
    Stats Research Process
  • Glossary
    Glossary
Categories
  • Internet
  • Gaming
  • Technology
  • Artificial Intelligence
  • Cybersecurity
Internet
Instagram Reels Statistics 2026: Plays and Engagement
Instagram Reels Statistics 2026: Plays and Engagement
Gig Economy Statistics 2026: Workforce & Earnings
Gig Economy Statistics 2026: Workforce & Earnings
Doomscrolling Statistics: Prevalence, Sleep and Mental Health
Doomscrolling Statistics: Prevalence, Sleep and Mental Health
TikTok Brain Statistics 2026: Attention, Memory, Health
TikTok Brain Statistics 2026: Attention, Memory, Health
TikTok Music Statistics 2026: Discovery, Charts and Streaming
TikTok Music Statistics 2026: Discovery, Charts and Streaming
Generation Alpha Statistics 2026: Population, Screen Time and Spending Power
Generation Alpha Statistics 2026: Population, Screen Time and Spending Power
Gaming
Fantasy Sports Statistics 2026: Users, Revenue & Trends
Fantasy Sports Statistics 2026: Users, Revenue & Trends
Apex Legends Statistics 2026: Players, Revenue, and Esports
Apex Legends Statistics 2026: Players, Revenue, and Esports
Fortnite Statistics 2026: Players, Revenue, Esports, and Engagement
Fortnite Statistics 2026: Players, Revenue, Esports, and Engagement
Gamers Statistics 2026: Players, Habits & Global Data
Gamers Statistics 2026: Players, Habits & Global Data
Minecraft Statistics 2026: 300 Million Copies Sold & 212M Monthly Players
Minecraft Statistics 2026: 300 Million Copies Sold & 212M Monthly Players
Video Games Industry Statistics 2026: Big Insights
Video Games Industry Statistics 2026: Big Insights
Technology
Employee Productivity Statistics 2026: Engagement, Costs & Trends
Employee Productivity Statistics 2026: Engagement, Costs & Trends
Software Engineer Layoff Statistics 2026: Companies, Roles, AI Impact
Software Engineer Layoff Statistics 2026: Companies, Roles, AI Impact
iPhone Ecosystem Statistics 2026: Big Market Trends
iPhone Ecosystem Statistics 2026: Big Market Trends
Average Screen Time by Age Statistics 2026: Latest Insights
Average Screen Time by Age Statistics 2026: Latest Insights
AI SEO Statistics 2026: Adoption, AI Overviews & LLM Citation Data
AI SEO Statistics 2026: Adoption, AI Overviews & LLM Citation Data
Digital Nomads Statistics 2026: Population, Demographics & Visa Data
Digital Nomads Statistics 2026: Population, Demographics & Visa Data
Artificial Intelligence
AI Influencer Marketing Statistics: Market Size and Engagement
AI Influencer Marketing Statistics: Market Size and Engagement
AI Market Statistics 2026: Size, Growth & Investment
AI Market Statistics 2026: Size, Growth & Investment
Meta AI Statistics 2026: Users, Capex, and Adoption Data
Meta AI Statistics 2026: Users, Capex, and Adoption Data
Predictive AI Statistics 2026: Market Size, Adoption & Accuracy Data
Predictive AI Statistics 2026: Market Size, Adoption & Accuracy Data
AI Overviews Statistics 2026: Google Search Impact Data
AI Overviews Statistics 2026: Google Search Impact Data
AI Recruitment Statistics 2026: Hiring Trends & Data
AI Recruitment Statistics 2026: Hiring Trends & Data
Cybersecurity
Password Statistics 2026: Credential Theft, MFA, and the Passkey Tipping Point
Password Statistics 2026: Credential Theft, MFA, and the Passkey Tipping Point
Identity Theft Statistics 2026: Key Fraud Data and Trends
Identity Theft Statistics 2026: Key Fraud Data and Trends
CVE Statistics 2026: Severity Distribution and Top Affected Vendors
CVE Statistics 2026: Severity Distribution and Top Affected Vendors
Dark Web AI Tool Marketplace Statistics 2026: Explosive Market Growth
Dark Web AI Tool Marketplace Statistics 2026: Explosive Market Growth
API Security Breach Statistics 2026: Hidden Threats
API Security Breach Statistics 2026: Hidden Threats
AI Voice Cloning Fraud Statistics 2026: Alarming Trends You Must Know Now
AI Voice Cloning Fraud Statistics 2026: Alarming Trends You Must Know Now
Categories
  • Internet
  • Gaming
  • Technology
  • Artificial Intelligence
  • Cybersecurity
Internet
Pinterest Bets Big on AI With Record $4B AWS Commitment
Pinterest Bets Big on AI With Record $4B AWS Commitment
Lovable Expands Google Cloud Deal, Boosts AI Infrastructure 5x
Lovable Expands Google Cloud Deal, Boosts AI Infrastructure 5x
Shopify Down: Thousands Report Outage and Checkout Issues
Shopify Down: Thousands Report Outage and Checkout Issues
Microsoft Investigates Teams and Office File Access Outage
Microsoft Investigates Teams and Office File Access Outage
Microsoft Confirms MFA Issues and My Sign Ins Downtime
Microsoft Confirms MFA Issues and My Sign Ins Downtime
iPhone 18 Pro Dummy Models Reveal Color Lineup
iPhone 18 Pro Dummy Models Reveal Color Lineup
Gaming
Epic Games Teases Unreal Engine 6 for Rocket League
Epic Games Teases Unreal Engine 6 for Rocket League
Stardew Valley Switch 2 Edition Arrives with Online Co-op
Stardew Valley Switch 2 Edition Arrives with Online Co-op
Hogwarts Legacy Crosses 40M Sales, Beating Industry Giants
Hogwarts Legacy Crosses 40M Sales, Beating Industry Giants
PUBG: Black Budget Launches Closed Alpha Test With a Bold PvPvE Twist
PUBG: Black Budget Launches Closed Alpha Test With a Bold PvPvE Twist
Counter-Strike 2’s $5.9 Billion Skin Economy Just Got Shattered
Counter-Strike 2’s $5.9 Billion Skin Economy Just Got Shattered
Battlefield 6 Outperforms Franchise Past with Record-Breaking Launch
Battlefield 6 Outperforms Franchise Past with Record-Breaking Launch
Technology
iPhone 18 Pro Max Leak Reveals No Change in Thickness
iPhone 18 Pro Max Leak Reveals No Change in Thickness
Google Adds Android Fake Call Detection for AI Scams
Google Adds Android Fake Call Detection for AI Scams
Nvidia RTX Spark Brings AI Superchip Power to Windows PCs
Nvidia RTX Spark Brings AI Superchip Power to Windows PCs
Apple Glasses Leak Reveals 2027 Release and New Design
Apple Glasses Leak Reveals 2027 Release and New Design
Asana Bets Big on AI Agents With $75 Million StackAI Acquisition
Asana Bets Big on AI Agents With $75 Million StackAI Acquisition
Taiwan Probes Nvidia AI Chip Smuggling to China via Japan
Taiwan Probes Nvidia AI Chip Smuggling to China via Japan
Artificial Intelligence
Google Unveils Gemma 4 12B With On Device AI Features
Google Unveils Gemma 4 12B With On Device AI Features
Perplexity’s Personal Computer for Windows Now Coming to Users
Perplexity’s Personal Computer for Windows Now Coming to Users
Amazon Bedrock Adds OpenAI GPT 5.5, GPT 5.4 and Codex
Amazon Bedrock Adds OpenAI GPT 5.5, GPT 5.4 and Codex
Claude AI Down for Users as Anthropic Confirms Outage
Claude AI Down for Users as Anthropic Confirms Outage
Anthropic Confidentially Files for Historic AI IPO
Anthropic Confidentially Files for Historic AI IPO
Claude Mythos Nears Public Release After Safety Tests
Claude Mythos Nears Public Release After Safety Tests
Cybersecurity
Ultrahuman Data Breach Exposes User Wellness Data
Ultrahuman Data Breach Exposes User Wellness Data
Trezor Safe 7 Chip Vulnerability Found in Security Audit
Trezor Safe 7 Chip Vulnerability Found in Security Audit
116K PCs Infected by WeedHack Minecraft Malware Campaign
116K PCs Infected by WeedHack Minecraft Malware Campaign
Anthropic Expands Project Glasswing With 150 New Partners
Anthropic Expands Project Glasswing With 150 New Partners
Google Patches Android Zero Day Under Active Attack
Google Patches Android Zero Day Under Active Attack
Dashlane Confirms Brute Force Attack on User Accounts
Dashlane Confirms Brute Force Attack on User Accounts
Newsletter

Subscribe To Our Newsletter!

Be the first to get exclusive offers and the latest news.

Newsletter

Subscribe To Our Newsletter!

Be the first to get exclusive offers and the latest news.