Dual-Agent Co-Training for Health Coaching via Implicit Adversarial Preference Optimization

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This work addresses the limitation of existing AI health coaching approaches, which typically optimize either the coach or the client simulator in isolation, thereby failing to adequately explore the interactive space. To overcome this, the authors propose a dual-agent co-training framework that introduces, for the first time, an implicit adversarial preference optimization mechanism. This mechanism employs a multi-dimensional large language model as a judge to identify Pareto-dominant response pairs, which are then inverted and used to adversarially train the client simulator, enabling coupled optimization of both agents. Grounded in stochastic game theory, the approach integrates direct preference optimization (DPO) with an adversarial preference reversal strategy, significantly enhancing coaching quality across multiple key dimensions and demonstrating its effectiveness and superiority.

📝 Abstract

Motivational-interviewing-based health coaching is an effective approach for improving mental health and promoting healthy behavior change. However, the scarcity of trained human coaches and the high cost of coaching services make such support inaccessible to many people who could benefit from it. This motivates the development of AI health coaches that can provide scalable and affordable support. Existing methods typically optimize only one side of the interaction: they either train a dialogue agent against a fixed client environment or train a client simulator against a fixed assistant. This one-sided setup can limit exploration of the interaction space and may be inefficient at developing the capabilities required by the target agent and pushing its performance boundaries. In this paper, we propose a dual-agent framework that interactively co-trains both the health coach agent and the client simulator. The coach is optimized with DPO using Pareto-dominant response pairs identified by a multi-dimensional LLM judge. In turn, the client is trained adversarially by reversing these preferences, inducing an implicit adversarial training dynamic. We further show that this co-training process admits a natural stochastic-game interpretation. Extensive experiments demonstrate that our method effectively improves coaching quality across several important dimensions.

Problem

Research questions and friction points this paper is trying to address.

health coaching

dual-agent co-training

adversarial preference optimization

dialogue agent

client simulator

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-Agent Co-Training

Implicit Adversarial Preference Optimization

Direct Preference Optimization (DPO)