Eliciting Least-to-Most Reasoning for Phishing URL Detection

📅 2026-01-28

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the heavy reliance on large-scale labeled data in phishing URL detection by proposing a few-shot detection method that integrates Least-to-Most prompting with an answer sensitivity mechanism. The approach guides large language models through iterative task decomposition to enable step-by-step reasoning, while leveraging answer sensitivity to focus on critical URL features. To the best of our knowledge, this is the first study to combine these mechanisms for phishing detection. Extensive experiments across three benchmark datasets and four mainstream large language models demonstrate that the proposed method significantly outperforms single-prompt baselines, achieves performance comparable to fully supervised models, and substantially reduces the dependency on annotated data.

Technology Category

Application Category

📝 Abstract

Phishing continues to be one of the most prevalent attack vectors, making accurate classification of phishing URLs essential. Recently, large language models (LLMs) have demonstrated promising results in phishing URL detection. However, their reasoning capabilities that enabled such performance remain underexplored. To this end, in this paper, we propose a Least-to-Most prompting framework for phishing URL detection. In particular, we introduce an"answer sensitivity"mechanism that guides Least-to-Most's iterative approach to enhance reasoning and yield higher prediction accuracy. We evaluate our framework using three URL datasets and four state-of-the-art LLMs, comparing against a one-shot approach and a supervised model. We demonstrate that our framework outperforms the one-shot baseline while achieving performance comparable to that of the supervised model, despite requiring significantly less training data. Furthermore, our in-depth analysis highlights how the iterative reasoning enabled by Least-to-Most, and reinforced by our answer sensitivity mechanism, drives these performance gains. Overall, we show that this simple yet powerful prompting strategy consistently outperforms both one-shot and supervised approaches, despite requiring minimal training or few-shot guidance. Our experimental setup can be found in our Github repository github.sydney.edu.au/htri0928/least-to-most-phishing-detection.

Problem

Research questions and friction points this paper is trying to address.

phishing URL detection

large language models

reasoning

prompting

answer sensitivity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Least-to-Most prompting

answer sensitivity

phishing URL detection