Leveraging Multimodal Self-Consistency Reasoning in Coding Motivational Interviewing for Alcohol Use Reduction

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the high cost of manual coding in Motivational Interviewing (MI) by proposing an efficient, automated solution. It introduces multimodal self-consistency reasoning for MI coding—leveraging audio language models to jointly integrate verbal content and acoustic prosody. The approach employs four distinct prompting strategies—analytical, prosody-aware, evidence-scoring, and contrastive—to generate multiple reasoning trajectories. Robustness is enhanced through stochastic sampling and majority voting across these trajectories. Evaluated on real-world MI recordings, the method achieves 52.56% accuracy and 46.40% macro F1 score, significantly outperforming baseline approaches. Ablation studies further confirm the contribution of each component to the overall performance, demonstrating the effectiveness of the proposed framework for automated MI fidelity assessment.

📝 Abstract

BACKGROUND: Coding Motivational Interviewing (MI) sessions is essential for understanding client behaviors and predicting outcomes, but it requires substantial time and labor from trained MI professionals. Recent advances in audio-language models (ALMs) offer new opportunities to automate MI coding by capturing multimodal behavioral signals. OBJECTIVE: This study aims to develop an automatic MI coding approach based on ALMs that analyzes raw audio input and integrates predictions from multiple reasoning trajectories using self-consistency to improve coding robustness. METHODS: We experimented with five recorded sessions from de-identified MI audio tapes. We deployed ALMs with four complementary analytic prompts to support utterance-level reasoning: analytic prompting for verbal cues, prosody-aware prompting for acoustic cues, evidence-scoring prompting for quantitative hypothesis testing, and comparative prompting for contrastive reasoning. Three stochastic samples were drawn for each prompt, generating 12 independent reasoning trajectories per utterance. Final predictions were determined by majority voting across all trajectories. RESULTS: Performance was evaluated using accuracy, precision, recall, and macro-F1 scores. The proposed multimodal self-consistency approach achieved 52.56% accuracy, 54.03% precision, 47.45% recall, and a macro-F1 score of 46.40%, exceeding baseline methods. Systematic ablation experiments that removed individual modules consistently degraded performance on the primary metrics. CONCLUSIONS: Multimodal self-consistency outperforms single-pass baseline prompting approaches for MI coding. These findings suggest that incorporating both what clients say and how they say it can support more reliable automatic MI coding.

Problem

Research questions and friction points this paper is trying to address.

Motivational Interviewing

MI coding

audio-language models

multimodal reasoning

behavioral coding

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal self-consistency

audio-language models

motivational interviewing coding