🤖 AI Summary
This work addresses the fundamental tension in vulnerability detection for modern software systems: the limited scalability of formal methods versus the lack of formal guarantees in large language model (LLM)-based approaches. We systematically survey and, for the first time, comprehensively compare three paradigms—formal verification (model checking and theorem proving), LLM-driven analysis, and hybrid methods. We propose the first hybrid verification framework that jointly ensures formal correctness and leverages LLMs’ semantic understanding: LLMs guide invariant generation, defect localization, and verification goal pruning to intelligently augment formal workflows. We rigorously characterize the applicability boundaries and complementarity mechanisms among the three paradigms. Empirical evaluation on real-world programs demonstrates synergistic improvements across detection recall, verification strength, and analysis efficiency. Our framework establishes a new, principled pathway for high-assurance software analysis—balancing mathematical rigor with practical deployability.
📝 Abstract
Software testing and verification are critical for ensuring the reliability and security of modern software systems. Traditionally, formal verification techniques, such as model checking and theorem proving, have provided rigorous frameworks for detecting bugs and vulnerabilities. However, these methods often face scalability challenges when applied to complex, real-world programs. Recently, the advent of Large Language Models (LLMs) has introduced a new paradigm for software analysis, leveraging their ability to understand insecure coding practices. Although LLMs demonstrate promising capabilities in tasks such as bug prediction and invariant generation, they lack the formal guarantees of classical methods. This paper presents a comprehensive study of state-of-the-art software testing and verification, focusing on three key approaches: classical formal methods, LLM-based analysis, and emerging hybrid techniques, which combine their strengths. We explore each approach's strengths, limitations, and practical applications, highlighting the potential of hybrid systems to address the weaknesses of standalone methods. We analyze whether integrating formal rigor with LLM-driven insights can enhance the effectiveness and scalability of software verification, exploring their viability as a pathway toward more robust and adaptive testing frameworks.