π€ AI Summary
This work addresses the widespread inconsistencies in Ethereum client API implementations, which can lead to financial losses, degraded user experience, and network-level risks. Existing testing approaches are largely manual and struggle to keep pace with the rapidly evolving specification. To overcome these limitations, the authors propose APIDifferβthe first specification-guided differential testing framework. APIDiffer automatically generates both compliant and non-compliant API requests, injects on-chain data, and leverages a large language model for semantic-aware false-positive filtering, enabling high-precision automated detection. Evaluated across 11 mainstream clients, APIDiffer uncovered 72 bugs (90.28% confirmed or fixed), improved code coverage by up to 89.67%, and reduced false positives by 37.38%. The tool has been adopted by the community and integrated into official project roadmaps.
π Abstract
The Ethereum ecosystem, which secures over $381 billion in assets, fundamentally relies on client APIs as the sole interface between users and the blockchain. However, these critical APIs suffer from widespread implementation inconsistencies, which can lead to financial discrepancies, degraded user experiences, and threats to network reliability. Despite this criticality, existing testing approaches remain manual and incomplete: they require extensive domain expertise, struggle to keep pace with Ethereum's rapid evolution, and fail to distinguish genuine bugs from acceptable implementation variations. We present APIDiffer, the first specification-guided differential testing framework designed to automatically detect API inconsistencies across Ethereum's diverse client ecosystem. APIDiffer transforms API specifications into comprehensive test suites through two key innovations: (1) specification-guided test input generation that creates both syntactically valid and invalid requests enriched with real-time blockchain data, and (2) specification-aware false positive filtering that leverages large language models to distinguish genuine bugs from acceptable variations. Our evaluation across all 11 major Ethereum clients reveals the pervasiveness of API bugs in production systems. APIDiffer uncovered 72 bugs, with 90.28% already confirmed or fixed by developers. Beyond these raw numbers, APIDiffer achieves up to 89.67% higher code coverage than existing tools and reduces false positive rates by 37.38%. The Ethereum community's response validates our impact: developers have integrated our test cases, expressed interest in adopting our methodology, and escalated one bug to the official Ethereum Project Management meeting.