RubRIX: Rubric-Driven Risk Mitigation in Caregiver-AI Interactions

📅 2026-01-19

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Current AI evaluation frameworks struggle to detect subtle risks in caregiving contexts, such as emotional neglect, bias, or inappropriate information in large language model (LLM) responses. This work proposes RubRIX—the first user-centered evaluation framework grounded in care ethics theory and validated by clinical experts—which translates ethical principles into five actionable risk dimensions and incorporates human-guided scoring rules to steer model refinement. Evaluations of six mainstream LLMs on over 20,000 real-world caregiver queries demonstrate that a single round of optimization guided by RubRIX reduces these risks by 45%–98%, substantially enhancing the safety and reliability of AI systems in high-stakes caregiving interactions.

Technology Category

Application Category

📝 Abstract

Caregivers seeking AI-mediated support express complex needs -- information-seeking, emotional validation, and distress cues -- that warrant careful evaluation of response safety and appropriateness. Existing AI evaluation frameworks, primarily focused on general risks (toxicity, hallucinations, policy violations, etc), may not adequately capture the nuanced risks of LLM-responses in caregiving-contexts. We introduce RubRIX (Rubric-based Risk Index), a theory-driven, clinician-validated framework for evaluating risks in LLM caregiving responses. Grounded in the Elements of an Ethic of Care, RubRIX operationalizes five empirically-derived risk dimensions: Inattention, Bias&Stigma, Information Inaccuracy, Uncritical Affirmation, and Epistemic Arrogance. We evaluate six state-of-the-art LLMs on over 20,000 caregiver queries from Reddit and ALZConnected. Rubric-guided refinement consistently reduced risk-components by 45-98% after one iteration across models. This work contributes a methodological approach for developing domain-sensitive, user-centered evaluation frameworks for high-burden contexts. Our findings highlight the importance of domain-sensitive, interactional risk evaluation for the responsible deployment of LLMs in caregiving support contexts. We release benchmark datasets to enable future research on contextual risk evaluation in AI-mediated support.

Problem

Research questions and friction points this paper is trying to address.

caregiver-AI interaction

risk evaluation

large language models

domain-sensitive evaluation

response safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rubric-based evaluation

Caregiver-AI interaction

Contextual risk assessment