CHOIR: Collaborative Harmonization fOr Inference Robustness

📅 2025-10-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit significant instability in reasoning trajectories and final answers under minor persona perturbations—e.g., pronoun substitutions—raising concerns about demographic sensitivity and fairness. Method: We propose CHOIR, a test-time framework that reframes demographic sensitivity as a robustness gain. CHOIR generates diverse reasoning paths via multi-persona conditional reasoning, employs dynamic consistency balancing and collaborative decoding, and ensembles outputs across personas without additional training. It treats persona variations as counterfactual collaborative signals to enhance cross-group reasoning stability. Contribution/Results: CHOIR achieves an average accuracy improvement of 19.2% across multiple reasoning benchmarks, with up to 26.4% gain for specific demographic subgroups. Crucially, it maintains robust performance gains even under suboptimal persona specifications, demonstrating strong generalizability and scalability.

Technology Category

Application Category

📝 Abstract
Persona-assigned Large Language Models (LLMs) can adopt diverse roles, enabling personalized and context-aware reasoning. However, even minor demographic perturbations in personas, such as simple pronoun changes, can alter reasoning trajectories, leading to divergent sets of correct answers. Instead of treating these variations as biases to be mitigated, we explore their potential as a constructive resource to improve reasoning robustness. We propose CHOIR (Collaborative Harmonization fOr Inference Robustness), a test-time framework that harmonizes multiple persona-conditioned reasoning signals into a unified prediction. CHOIR orchestrates a collaborative decoding process among counterfactual personas, dynamically balancing agreement and divergence in their reasoning paths. Experiments on various reasoning benchmarks demonstrate that CHOIR consistently enhances performance across demographics, model architectures, scales, and tasks - without additional training. Improvements reach up to 26.4% for individual demographic groups and 19.2% on average across five demographics. It remains effective even when base personas are suboptimal. By reframing persona variation as a constructive signal, CHOIR provides a scalable and generalizable approach to more reliable LLM reasoning.
Problem

Research questions and friction points this paper is trying to address.

Harmonizing diverse persona-conditioned reasoning signals
Addressing demographic perturbation effects on reasoning trajectories
Improving LLM robustness through collaborative decoding without retraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Harmonizes multiple persona-conditioned reasoning signals
Orchestrates collaborative decoding among counterfactual personas
Balances agreement and divergence in reasoning paths dynamically
🔎 Similar Papers
No similar papers found.