🤖 AI Summary
This study addresses the challenges in medical search query intent recognition, including semantic complexity, scarce annotations, ambiguous click behaviors, and misalignment between conversational and global intents. To tackle these issues, the authors propose a conversation-aware multi-intent representation learning approach that aggregates semantically similar queries through clustering, designs a novel loss function to model the multi-intent nature of queries, and introduces a Consistency Rate (CR) metric to quantify query ambiguity and intent misalignment. Experimental results on real-world log datasets—Health Search and TripClick—demonstrate that the proposed method significantly improves both the clustering quality of query representations and the accuracy of downstream intent classification tasks.
📝 Abstract
Classifying the intent behind healthcare search queries is crucial for improving the delivery of online healthcare information. The intricate nature of medical search queries, coupled with the limited availability of high-quality labeled data, presents substantial challenges for developing efficient classification models. Previous studies have exploited user interaction data, such as user clicks from search logs and employed pairwise loss functions to model co-click behavior for query representation learning. However, many health queries could have multiple intents, resulting in ambiguous or divergent click behavior. Furthermore, learning the single most popular intent of queries as inferred from global statistics based on the aggregate behavior of different users could potentially lead to disparity and performance drop when classifying the query intent within specific search sessions. To address these limitations, our work improves the query representation learning by aggregating similar queries via clustering, and introducing a novel loss function designed to capture the multifaceted nature of health search queries, resulting in a more scalable and accurate learning procedure. Furthermore, we quantify the ambiguity of health queries and the misalignment between global search intents and those discerned from individual sessions, by introducing the concordance rate (CR) score, and demonstrate a simple and effective method for incorporating our learned query representation into contextual, session-based search intent classification. Our extensive experimental results and analysis on two real-world search log datasets, i.e., a Health Search (HS) dataset and the publicly available TripClick dataset, demonstrate that our approach not only improves the intrinsic clustering metrics for query representation learning but also enhances accuracy for subsequent search intent classification tasks.