CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants

📅 2024-10-28
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of sustaining user-specific constraints—such as health restrictions and ethical preferences—in safety-critical multi-turn dialogues with large language models (LLMs). To this end, it introduces the first benchmark for multi-turn personalized alignment tailored to safety-sensitive scenarios. The benchmark comprises 337 × 5 multi-turn dialogue instances spanning domains including healthcare and finance, systematically exposing four failure modes across mainstream LLMs: conflict resolution among competing preferences, trade-offs between safety and user desires, inefficient contextual information utilization, and inconsistent application of user knowledge. Experiments reveal that generic “harmless and helpful” instructions fail to ensure alignment, whereas explicit safety-aware contextual prompting significantly improves performance. The paper further proposes a novel paradigm integrating introspective modeling, online user representation learning, and dynamic risk assessment—demonstrating that strong reasoning capability does not imply robust personalized alignment.

Technology Category

Application Category

📝 Abstract
We introduce a multi-turn benchmark for evaluating personalised alignment in LLM-based AI assistants, focusing on their ability to handle user-provided safety-critical contexts. Our assessment of ten leading models across five scenarios (with 337 use cases each) reveals systematic inconsistencies in maintaining user-specific consideration, with even top-rated"harmless"models making recommendations that should be recognised as obviously harmful to the user given the context provided. Key failure modes include inappropriate weighing of conflicting preferences, sycophancy (prioritising desires above safety), a lack of attentiveness to critical user information within the context window, and inconsistent application of user-specific knowledge. The same systematic biases were observed in OpenAI's o1, suggesting that strong reasoning capacities do not necessarily transfer to this kind of personalised thinking. We find that prompting LLMs to consider safety-critical context significantly improves performance, unlike a generic 'harmless and helpful' instruction. Based on these findings, we propose research directions for embedding self-reflection capabilities, online user modelling, and dynamic risk assessment in AI assistants. Our work emphasises the need for nuanced, context-aware approaches to alignment in systems designed for persistent human interaction, aiding the development of safe and considerate AI assistants.
Problem

Research questions and friction points this paper is trying to address.

Personalization
Safety-Critical Scenarios
Multi-Turn Dialogue
Innovation

Methods, ideas, or system contributions that make the work stand out.

Personalized Alignment
Safety-Critical Context
Dynamic Risk Assessment
🔎 Similar Papers
No similar papers found.