InteractComp: Evaluating Search Agents With Ambiguous Queries

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In real-world search, user queries are often ambiguous or incomplete, necessitating interactive clarification; however, existing search agents lack such capabilities, and no suitable evaluation benchmark exists. Method: We introduce InteractComp, a novel benchmark comprising 210 expert-crafted ambiguous questions across nine domains, featuring a first-of-its-kind “target–distractor” methodology to generate realistic, verifiable, interaction-required disambiguation tasks. Contribution/Results: InteractComp reveals severe overconfidence in mainstream models: the best-performing of 17 models achieves only 13.73% accuracy under interactive settings—far below its 71.50% accuracy with full context. Forcing interaction significantly improves performance, demonstrating that current prompting strategies fail to activate models’ latent interactive reasoning capacity. This work provides the first systematic evaluation exposing critical deficiencies in search agents’ ability to recognize ambiguity and proactively seek clarification, establishing a new standard and empirical foundation for interactive search research.

Technology Category

Application Category

📝 Abstract
Language agents have demonstrated remarkable potential in web search and information retrieval. However, these search agents assume user queries are complete and unambiguous, an assumption that diverges from reality where users begin with incomplete queries requiring clarification through interaction. Yet most agents lack interactive mechanisms during the search process, and existing benchmarks cannot assess this capability. To address this gap, we introduce InteractComp, a benchmark designed to evaluate whether search agents can recognize query ambiguity and actively interact to resolve it during search. Following the principle of easy to verify, interact to disambiguate, we construct 210 expert-curated questions across 9 domains through a target-distractor methodology that creates genuine ambiguity resolvable only through interaction. Evaluation of 17 models reveals striking failure: the best model achieves only 13.73% accuracy despite 71.50% with complete context, exposing systematic overconfidence rather than reasoning deficits. Forced interaction produces dramatic gains, demonstrating latent capability current strategies fail to engage. Longitudinal analysis shows interaction capabilities stagnated over 15 months while search performance improved seven-fold, revealing a critical blind spot. This stagnation, coupled with the immediate feedback inherent to search tasks, makes InteractComp a valuable resource for both evaluating and training interaction capabilities in search agents. The code is available at https://github.com/FoundationAgents/InteractComp.
Problem

Research questions and friction points this paper is trying to address.

Evaluating search agents handling ambiguous user queries
Assessing interactive disambiguation capabilities during search
Identifying systematic overconfidence in query resolution models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces benchmark for evaluating search agent interaction
Uses target-distractor methodology to create ambiguous queries
Forces interaction to resolve ambiguity and improve accuracy
🔎 Similar Papers
No similar papers found.
M
Mingyi Deng
DeepWisdom
L
Lijun Huang
The Hong Kong University of Science and Technology (Guangzhou)
Y
Yani Fan
The Hong Kong University of Science and Technology (Guangzhou)
J
Jiayi Zhang
The Hong Kong University of Science and Technology (Guangzhou)
F
Fashen Ren
The Hong Kong University of Science and Technology (Guangzhou)
J
Jinyi Bai
Renmin University of China
F
Fuzhen Yang
Renmin University of China
D
Dayi Miao
The Hong Kong University of Science and Technology (Guangzhou)
Zhaoyang Yu
Zhaoyang Yu
DeepWisdom
Large Language ModelAI Agents
Y
Yifan Wu
The Hong Kong University of Science and Technology (Guangzhou)
Y
Yanfei Zhang
DeepWisdom
Fengwei Teng
Fengwei Teng
Renmin University of China
LLM reasoning
Y
Yingjia Wan
DeepWisdom
Song Hu
Song Hu
Professor of Biomedical Engineering, Washington University in St. Louis
PhotoacousticsBiophotonicsBrain ImagingFiber OpticsNanophotonics
Y
Yude Li
DeepWisdom
X
Xin Jin
DeepWisdom
C
Conghao Hu
DeepWisdom
H
Haoyu Li
DeepWisdom
Q
Qirui Fu
DeepWisdom
T
Tai Zhong
Agent Universe
X
Xinyu Wang
McGill University
X
Xiangru Tang
Yale University
Nan Tang
Nan Tang
National Institute of Biological Sciences, Beijing
stem cell biologyaginglung diseases
Chenglin Wu
Chenglin Wu
Founder & CEO, DeepWisdom
Foundation AgentsArtificial IntelligenceAutoML
Yuyu Luo
Yuyu Luo
Assistant Professor, HKUST(GZ) / HKUST
Data AgentsLLM AgentsDatabaseText-to-SQLData-centric AI