Towards Interactive Intelligence for Digital Humans

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This paper addresses the critical limitations of digital humans—lacking personality consistency, interactive adaptability, and self-evolutionary capability—by proposing Mio, an end-to-end multimodal interactive framework. Methodologically, it pioneers the “interactive intelligence” paradigm, introducing the Omni-Avatar architecture comprising five synergistic modules: multimodal large-model collaborative reasoning, personality-aligned controllable generation, real-time speech–expression–gesture co-driven animation, and neural rendering. Key contributions include: (1) the first comprehensive benchmark specifically designed for evaluating interactive intelligence in digital humans; and (2) state-of-the-art performance on this benchmark, achieving significant improvements in facial expression naturalness, dialogue coherence, motion coordination, personality consistency, and evolutionary capability.

Technology Category

Application Category

📝 Abstract

We introduce Interactive Intelligence, a novel paradigm of digital human that is capable of personality-aligned expression, adaptive interaction, and self-evolution. To realize this, we present Mio (Multimodal Interactive Omni-Avatar), an end-to-end framework composed of five specialized modules: Thinker, Talker, Face Animator, Body Animator, and Renderer. This unified architecture integrates cognitive reasoning with real-time multimodal embodiment to enable fluid, consistent interaction. Furthermore, we establish a new benchmark to rigorously evaluate the capabilities of interactive intelligence. Extensive experiments demonstrate that our framework achieves superior performance compared to state-of-the-art methods across all evaluated dimensions. Together, these contributions move digital humans beyond superficial imitation toward intelligent interaction.

Problem

Research questions and friction points this paper is trying to address.

Develops interactive digital humans with personality and adaptability

Creates an end-to-end framework for multimodal avatar integration

Establishes a benchmark to evaluate interactive intelligence capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end multimodal framework for digital humans

Integrated cognitive reasoning with real-time embodiment

New benchmark for evaluating interactive intelligence capabilities

🔎 Similar Papers

No similar papers found.

Apple

Sunnyvale, United States of America

Senior Applied Research Engineer - Multimodal LLMs for Human Interaction

Apple

Sunnyvale, United States of America

AI Research Scientist, Robotics