Taxonomy and Analysis of Sensitive User Queries in Generative AI Search

📅 2024-04-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Deploying generative large language models (LLMs) in national-scale search engines poses significant challenges in identifying sensitive user queries. Existing approaches relying on keyword matching lack semantic and contextual granularity, failing to capture nuanced risks such as illegality, privacy violations, and ethical concerns. Method: We propose the first fine-grained sensitive query taxonomy tailored for generative search scenarios, integrating semantic intent with multidimensional risk dimensions. Leveraging a real-world search log dataset of over ten million queries, we employ a hybrid methodology combining human annotation, rule-based engines, and lightweight classification models to systematically discover and dynamically attribute sensitive patterns. Contribution/Results: The taxonomy identifies 12 core sensitive query categories, establishing a foundational methodology for sensitive content governance in LLM-native search—previously absent in industry practice. It has been integrated into pre-deployment safety filtering and response strategy optimization, substantially reducing the generation and dissemination of harmful content.

Technology Category

Application Category

📝 Abstract

Although there has been a growing interest among industries in integrating generative LLMs into their services, limited experience and scarcity of resources act as a barrier in launching and servicing large-scale LLM-based services. In this paper, we share our experiences in developing and operating generative AI models within a national-scale search engine, with a specific focus on the sensitiveness of user queries. We propose a taxonomy for sensitive search queries, outline our approaches, and present a comprehensive analysis report on sensitive queries from actual users. We believe that our experiences in launching generative AI search systems can contribute to reducing the barrier in building generative LLM-based services.

Problem

Research questions and friction points this paper is trying to address.

Classify sensitive user queries

Analyze generative AI search systems

Reduce barriers in LLM services

Innovation

Methods, ideas, or system contributions that make the work stand out.

Taxonomy for sensitive queries

Generative AI model operation

Analysis of user query sensitivity

🔎 Similar Papers

No similar papers found.