Taxonomy and Analysis of Sensitive User Queries in Generative AI Search

๐Ÿ“… 2024-04-05
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Deploying generative large language models (LLMs) in national-scale search engines poses significant challenges in identifying sensitive user queries. Existing approaches relying on keyword matching lack semantic and contextual granularity, failing to capture nuanced risks such as illegality, privacy violations, and ethical concerns. Method: We propose the first fine-grained sensitive query taxonomy tailored for generative search scenarios, integrating semantic intent with multidimensional risk dimensions. Leveraging a real-world search log dataset of over ten million queries, we employ a hybrid methodology combining human annotation, rule-based engines, and lightweight classification models to systematically discover and dynamically attribute sensitive patterns. Contribution/Results: The taxonomy identifies 12 core sensitive query categories, establishing a foundational methodology for sensitive content governance in LLM-native searchโ€”previously absent in industry practice. It has been integrated into pre-deployment safety filtering and response strategy optimization, substantially reducing the generation and dissemination of harmful content.

Technology Category

Application Category

๐Ÿ“ Abstract
Although there has been a growing interest among industries in integrating generative LLMs into their services, limited experience and scarcity of resources act as a barrier in launching and servicing large-scale LLM-based services. In this paper, we share our experiences in developing and operating generative AI models within a national-scale search engine, with a specific focus on the sensitiveness of user queries. We propose a taxonomy for sensitive search queries, outline our approaches, and present a comprehensive analysis report on sensitive queries from actual users. We believe that our experiences in launching generative AI search systems can contribute to reducing the barrier in building generative LLM-based services.
Problem

Research questions and friction points this paper is trying to address.

Classify sensitive user queries
Analyze generative AI search systems
Reduce barriers in LLM services
Innovation

Methods, ideas, or system contributions that make the work stand out.

Taxonomy for sensitive queries
Generative AI model operation
Analysis of user query sensitivity
๐Ÿ”Ž Similar Papers
No similar papers found.
Hwiyeol Jo
Hwiyeol Jo
Research Scientist, at NAVER Cloud
AI with CogSciNLP using ML
Taiwoo Park
Taiwoo Park
NAVER Search US, NAVER
Nayoung Choi
Nayoung Choi
PhD Student @ Emory CS
Natural Language ProcessingInformation Retrieval
C
Changbong Kim
NAVER
O
Ohjoon Kwon
NAVER
D
Donghyeon Jeon
NAVER
Hyunwoo Lee
Hyunwoo Lee
Korea Institute for Energy Technology (KENTECH)
Network SecurityTLSPKIAI SecurityIntrusion detection
E
Eui-Hyeon Lee
NAVER Search US, NAVER
K
Kyoungho Shin
NAVER Search US
S
Sun Suk Lim
NAVER Search US, NAVER
K
Kyungmi Kim
NAVER Search US, NAVER
J
Jihye Lee
NAVER Search US, NAVER
S
Sun Kim
NAVER Search US