🤖 AI Summary
To address the collaborative task requirements of multi-embodied intelligent devices (MEIDs) in 6G networks—particularly deficiencies in multimodal fusion, adaptive semantic communication, and decision interpretability—this paper proposes a semantics-driven heterogeneous coordination framework. It introduces a unified semantic representation and task decomposition mechanism for cross-modal (radar/image) feature fusion; designs a dynamic adaptive encoding-modulation strategy to enhance channel robustness; and incorporates a Grad-CAM visualization module to improve decision transparency. Evaluated in a post-earthquake rescue simulation, the framework achieves a 95.4% task completion rate and 95% semantic transmission efficiency, significantly outperforming baseline methods in semantic consistency and energy efficiency. The core contribution lies in the first end-to-end integration of interpretable semantic communication, multimodal embodied coordination, and dynamic adaptation within an embodied intelligence network—establishing a novel paradigm for high-reliability emergency collaboration.
📝 Abstract
In the 6G era, semantic collaboration among multiple embodied intelligent devices (MEIDs) becomes crucial for complex task execution. However, existing systems face challenges in multimodal information fusion, adaptive communication, and decision interpretability. To address these limitations, we propose a collaborative Conversational Embodied Intelligence Network (CC-EIN) integrating multimodal feature fusion, adaptive semantic communication, task coordination, and interpretability. PerceptiNet performs cross-modal fusion of image and radar data to generate unified semantic representations. An adaptive semantic communication strategy dynamically adjusts coding schemes and transmission power according to task urgency and channel quality. A semantic-driven collaboration mechanism further supports task decomposition and conflict-free coordination among heterogeneous devices. Finally, the InDec module enhances decision transparency through Grad-CAM visualization. Simulation results in post-earthquake rescue scenarios demonstrate that CC-EIN achieves 95.4% task completion rate and 95% transmission efficiency while maintaining strong semantic consistency and energy efficiency.