CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection

📅 2025-08-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Detecting subtle logical vulnerabilities in cryptographic implementations remains challenging due to their domain-specific semantics and low signal-to-noise ratio. Method: We propose a domain-knowledge-augmented large language model (LLM) analysis framework, integrating a curated cryptography knowledge base of 12,000+ expert-verified entries with chain-of-thought prompting and retrieval-augmented generation (RAG) to enable precise semantic understanding and vulnerability reasoning over multilingual cryptographic code. Contribution/Results: Our approach deeply embeds structured cryptographic knowledge into the LLM’s inference process, significantly enhancing logical vulnerability detection. On a benchmark of 92 real-world cases, state-of-the-art LLMs achieve up to a 28.69% absolute accuracy improvement. Moreover, we identified nine previously unknown security flaws in widely used open-source projects—including OpenSSL and LibTomCrypt—demonstrating both efficacy and practical applicability.

Technology Category

Application Category

📝 Abstract
Cryptographic algorithms are fundamental to modern security, yet their implementations frequently harbor subtle logic flaws that are hard to detect. We introduce CryptoScope, a novel framework for automated cryptographic vulnerability detection powered by Large Language Models (LLMs). CryptoScope combines Chain-of-Thought (CoT) prompting with Retrieval-Augmented Generation (RAG), guided by a curated cryptographic knowledge base containing over 12,000 entries. We evaluate CryptoScope on LLM-CLVA, a benchmark of 92 cases primarily derived from real-world CVE vulnerabilities, complemented by cryptographic challenges from major Capture The Flag (CTF) competitions and synthetic examples across 11 programming languages. CryptoScope consistently improves performance over strong LLM baselines, boosting DeepSeek-V3 by 11.62%, GPT-4o-mini by 20.28%, and GLM-4-Flash by 28.69%. Additionally, it identifies 9 previously undisclosed flaws in widely used open-source cryptographic projects.
Problem

Research questions and friction points this paper is trying to address.

Detecting subtle logic flaws in cryptographic algorithm implementations
Automating vulnerability detection using Large Language Models (LLMs)
Improving detection accuracy across diverse programming languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs for cryptographic vulnerability detection
Combines CoT prompting with RAG
Uses curated cryptographic knowledge base
🔎 Similar Papers
No similar papers found.
Z
Zhihao Li
Sichuan University, Chengdu, China
Z
Zimo Ji
The Hong Kong University of Science and Technology, Hong Kong SAR, China
T
Tao Zheng
Sichuan University, Chengdu, China
Hao Ren
Hao Ren
MPhil Student, University of New South Wales
Graph Neural NetworkNeural ODEsDeep Learning
X
Xiao Lan
Sichuan University, Chengdu, China