Exploiting 15 High-Severity Zero-Day Vulnerabilities: 0G Lab Collaborates with Teams from Tsinghua University, Peking University, and BUPT to Develop Consensus Protocol Debug AI Framework

Bitsfull2026/06/11 14:106775

Summary:

Exploiting 15 High-Severity Zero-Day Vulnerabilities: 0G Lab Collaborates with Teams from Tsinghua University, Peking University, and BUPT to Develop Consensus Protocol Debug AI Framework


The "Holy Grail" of distributed systems — Consensus Protocols — has long been a "Bug Hell" for top infrastructure engineers. Due to its extremely complex nature and the intertwining of multiple nodes, traditional testing and monolithic LLM (Large Language Models) are almost powerless against hardcore Deep Bugs.


Recently, in the latest ICML 2026 preprint paper, researchers from 0G Labs, as well as top academic and industry teams from the National University of Singapore, Peking University, Beijing University of Posts and Telecommunications, and others, proposed the first automated testing framework that seamlessly integrates domain knowledge with large-scale multi-agent collaboration — Agora.


Through an innovative architecture, this framework directly addresses protocol pain points and has successfully uncovered 15 previously unknown protocol-level Deep Bugs in industrial-grade and academic core protocols such as Raft, EPaxos, HotStuff, and BullShark! In comparison, even the mighty GPT-5.2 and Claude 4.5 native large models floundered and achieved zero. At a time when Multi-Agent systems and "Agentic Quality Control" have become the hottest trends in 2026, Agora offers not just a paper but a practical industrial-level solution.


Paper: "Agora: Toward Autonomous Bug Detection in Production-Level Consensus Protocols with LLM Agents"


1. Background: The Powerful Collaboration Between 0G and NUS, Long-term System Knowledge Accumulation, and the Cross-generational Integration of the Multi-Agent Paradigm


The evolution of distributed consensus protocols is not only a history of genius innovation but also a bloody history of countless top engineers stumbling in the dark. As Turing Award winner Lamport has said, ensuring the correctness of a distributed protocol implementation is as difficult as blindly navigating a constantly shifting maze. And on this "hellish" track, the market is quietly shifting: according to Gartner, enterprise inquiries about multi-agent systems have surged more than tenfold in just over a year, and the multi-agent platform market is entering a phase of nearly doubling in size each year — utilizing "multi-agent collaboration" for the most hardcore validation of underlying systems is transitioning from a cutting-edge concept to an industrial necessity.


Faced with this hell-level racetrack, the tech giants with halos on their heads took the lead in launching an asset-heavy exploration. For example, the industry-leading Anthropic recently advanced its Glasswing project internally within the Claude Code. Although attempting to touch the underlying infrastructure with an Agent, the architecture still heavily relies on the highest-spec head commercial-grade model. The project details are vaguely described and only engage in targeted closed-door collaborations with a few large tech institutions and multinational giants. More fatally, such giant solutions may exhibit a terrifying Token consumption rate in operation. This high computational power barrier and asset-heavy approach directly shut out budget-limited startups and small to medium-sized enterprises.


Are small companies and open-source communities destined to be unable to afford top-tier automated vulnerability assessment tools?


Engineers from 0G Labs and Professors Xiang Liu from the National University of Singapore, Sa Song from the Beijing University of Posts and Telecommunications, and Professor Yong Sun, in collaboration with Ph.D. student Zhao Zhang from the School of Intelligence at Peking University, empowered their profound knowledge in the Agent field to systemize a "small but great" disruptive innovation. Their work has been accepted by the top conference ICML in 2026AI.


The academic world's "long-term system knowledge accumulation" has encountered the industry's "pain points and keen sense of smell." How can we ignite the next generation of system security revolution?


The 0G team has accumulated extremely rich practical attack and defense experience in the implementation of blockchain consensus protocols. Meanwhile, the team has profound academic foundations in high-performance distributed systems, low-level concurrency control, and system formal verification. They are well aware that traditional methods (such as Fuzzing fuzz testing) often face limitations due to state space explosion when dealing with industrial-scale codebases. After much deliberation, multiple researchers decided to inject the long-accumulated distributed system global invariants logical deduction knowledge as the "soul" into the cutting-edge multi-agent collaborative paradigm and automated Harness architecture, releasing the open-source egalitarian Agora framework.


Simultaneously, as a cutting-edge modular AI infrastructure and high-performance decentralized data availability network, the 0G team has accumulated extremely rich practical attack and defense experience and real-world protocol flaw samples in the industrial implementation of blockchain consensus protocols and high-concurrency BFT (Byzantine Fault Tolerance) architecture.


This interdisciplinary fusion has completely changed the rules of the game: it is neither blind brute force testing nor a large model "blindfolded elephant touching" lacking domain knowledge. Instead, through specialized Agent division of labor, it transforms decades of logical inference intuition from seasoned system experts into the game and coordination between Agents, thereby possessing the hardcore strength to reduce dimensions and strike against traditional testing tools.


Unlike Glasswing's tendency to swallow huge amounts of top-tier tokens, Agora has brought forth a solution extremely friendly to small and medium-sized enterprises — proving that even in a pedestal model that is "almost there" and more cost-effective, a sophisticated domain-aware multi-agent architecture can still pinpoint hardcore Deep Bugs!


2. Pain Point: Monolithic LLM Facing Insurmountable Challenges, Distributed System Gripped by the "Sword of Damocles of Deep Logic"


In today's era dominated by big data, blockchain, and distributed databases, consensus protocols (such as Paxos, Raft, PBFT, etc.) serve as the bedrock of the entire digital world. However, the implementation of consensus protocols is notoriously of "hell-level difficulty." Even industrial-grade benchmark projects like etcd, refined by countless top engineers globally and running for years, still harbor Deep Bugs that send shivers down one's spine.


These bugs are different from common low-level implementation bugs like memory leaks, integer overflows, etc. They span multiple execution phases and rely on complex concurrent states. Once maliciously triggered, they not only lead to core data corruption but can also cause catastrophic financial losses.


Despite the recent hype around large language models (LLMs) showcasing remarkable performance in regular code analysis, they appear "intellectually challenged" when faced with distributed consensus. At most, they can identify superficial defects in local code. However, when dealing with protocol-level logic bugs that rely on global states, monolithic LLMs often get stuck in the mire of local code and are unable to perform global temporal reasoning.


3. Breakthrough: Agora's Three Agent Grand Shift and Core Harness Architecture


To break this deadlock, Agora has pioneered the integration of the academic world's classic Hypothesis-Driven Testing (HDT) paradigm into large-scale model agent systems. To achieve efficient global reasoning, Agora has completely abandoned the traditional "solo" mode of operation, cleverly decoupling the workflow into three highly specialized agents:


Orchestrator Agent: Responsible for global state maintenance and "vulnerability exploitation" that extrapolates known bugs;


Strategy Agent: Responsible for injecting distributed domain knowledge to generate highly aggressive abnormal scenarios tailored to CFT and BFT protocols.


TestGen Agent (Code Genie): Pragmatist. The key to Agora's real-world implementation and closed-loop effective testing lies in its core automated testing architecture.


The architecture is as shown in the diagram:



In Agora's overall design, this "think big, start small" approach is not arbitrary but stems from the seamless integration of its sophisticated intelligent agent interaction mechanism and Test Harness architecture.


Within the system framework, the research team has specially designed a set of minimalist, efficient communication and memory mechanisms (Succinct Memory & Communication) to minimize redundant context transfer overhead while ensuring each agent focuses on its core task. Under this stringent communication constraint, the Orchestrator Agent (responsible for global coordination and state control), Strategy Agent (responsible for distributed anomaly environment and scenario generation), and TestGen Agent (responsible for code testing and dynamic evaluation) seamlessly interact, collectively driving and fulfilling the Harness architecture:


Autonomous closed-loop synergy: When the Strategy Agent deduces abstract distributed attack scenarios, relying on the highly decoupled interaction framework, the TestGen Agent can immediately initiate the underlying test. This architecture not only possesses robust environmental adaptability, spanning different programming language environments such as Go and Rust, transforming attack hypotheses into real executable unit tests, but also incorporates efficient reflection-loop technology.


Once an error occurs during testing in the environment, the system accurately and instantly captures the call stack and execution logs, and succinctly feeds them back to the agent for targeted self-correction. This organic combination of "multi-agent minimalist interaction + dynamic Harness closed-loop" enables Agora to pinpoint the most elusive deep logic bugs at an extremely low token cost and produce detailed analysis reports with an extremely low false positive rate.


The final runtime overview is as shown in the diagram:



4. Results: Achieved 15 top-tier Zero-Day Deep Bugs, full model baseline zeroed.


The evaluation results are stunning. The research team conducted a comprehensive review across four well-known consensus protocol libraries (including the production-grade etcd and the underlying components of the emerging public chain core Sui), comparing against the state-of-the-art models such as GPT-5.2, Gemini 3.0 Pro Preview, Claude Sonnet 4.5, and Qwen3 Coder.


The results not only made the consensus system run by 0G itself more secure but also showed overwhelming dimensionality reduction:


15 brand-new Logic Deep Bugs emerged: Agora successfully identified 15 previously unknown protocol-level deep logic vulnerabilities. These vulnerabilities span high-risk areas such as execution divergence, monotonicity violation, topological flaws, signature vulnerabilities, and more.


Native large models were all left bald: In contrast, the baseline models (even equipped with the advanced ReAct dynamic toolchain) all failed to catch these deep logic bugs, resulting in a grand zero out of 15 vulnerabilities detected. They consumed a large number of tokens but could only spin their wheels on low-level code implementation bugs.


Extremely low false positive rate and high cost-effectiveness: Among all bug reports output by Agora, a whopping 73.9% were genuine logic vulnerabilities (with only a 26.1% false positive rate). More astonishingly, the cost to uncover a top-tier logic bug that would make a senior architect pull their hair out averages only about 5.32M tokens (approximately $40), demonstrating outstanding cost-effectiveness.


The results on multiple LLMs are as follows:



5. Future: High Scalability, Venturing into More Hardcore "No Man's Land"


Agora's success not only provided a shot in the arm for the security of distributed systems but also pointed the way for large models to land in vertical industrial applications.


Of particular importance, Agora's architectural design has shown extremely high scalability and generality. The research team emphasized that Agora can also be quickly reproduced and used by a wide range of users in the form of plugins or skills. Relevant skills for reproduction are available in our code (github.com/0gfoundation/agora). Furthermore, Agora's "large model + multi-agent collaboration + hypothesis-driven" paradigm is not limited to consensus protocols. Due to its deep decoupling of underlying workflow control and upper-level domain knowledge bases, this architecture can not only help numerous users quickly debug consensus protocols but can also rapidly extend to other hardcore domains equally plagued by the "deep logic bug hell" in a "plug-and-play" manner:


Database Concurrency Control: Used to test the distributed database for complex transaction conflicts under extreme isolation levels such as Serializable.


Operating System Kernel / Concurrent Systems: Deep dive into hidden deadlocks and race conditions in multithreaded infrastructures.


Web3 Smart Contract Audit: In-depth security boundary exploration for cross-chain protocols and DeFi logic involving complex economic models. The blockchain security market is expected to reach around $8.5 billion by 2026, with the emergence of commercial products that perform smart contract audits as "multi-agent security systems" and compress the audit cycle from weeks to hours. The market demand is booming.


The AI automation security era of industrial-grade underlying infrastructure may be officially ushered in by Agora and its Harness architecture.


We have reason to believe that Agora can better test the coding LLM's ability through the discovery of more deep bugs in various domains. The deep bug use cases it discovers can also help coding LLM enhance its code comprehension.


Agora can significantly enhance the security of code repositories that serve as the foundation for financial secure transactions, such as consensus protocols, concurrency control, smart contracts, and more. Moreover, Agora can help more tech companies discover deeper logic bugs while consuming fewer tokens, saving funds yet being more efficient!


More importantly, this happens to hit the two hottest tracks right now: first, multi-agent systems are transitioning from experimental to production – Gartner predicts that by 2028, over 30% of enterprise software will embed agentic AI, and the multi-agent platform market size will surge from the tens of billions to the hundreds of billions within a few years; second, "using agents to review agents" Agentic Quality Control is set to become an industry standard in 2026.


In the Veracode 2025 report, it is pointed out that about 45% of AI-generated code contains security vulnerabilities, against the backdrop of the agentic AI security market racing at a Compound Annual Growth Rate (CAGR) of around 42%, Agora enables tech companies to uncover deeper Logic Bugs at a lower token cost, transforming security audits from a "manpower-intensive weekly billing activity" to an "hourly delivery automated capability."


And as the landscape of this track becomes clearer, those who truly seize the opportunity are often not the biggest players in terms of volume, but rather the teams that were the first to validate the methodology and can consistently replicate it.


Original Article Link



Welcome to join the official BlockBeats community:

Telegram Subscription Group: https://t.me/theblockbeats

Telegram Discussion Group: https://t.me/BlockBeats_App

Official Twitter Account: https://twitter.com/BlockBeatsAsia