EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

To address the high computational cost and deployment challenges of multimodal egocentric perception models in resource-constrained settings, this paper proposes an adaptive multimodal distillation and policy learning framework. The framework integrates cross-modal knowledge transfer with task-driven dynamic inference, featuring a lightweight policy module designed for heterogeneous action spaces, adaptive multimodal distillation, cross-modal attention, and reinforcement learning–guided inference path selection. Evaluated on EPIC-Kitchens, EasyCom, and Aria Everyday Activities datasets, the method achieves state-of-the-art efficiency: 89.09% reduction in GMACs, 82.02% fewer parameters, and 9.6× lower energy consumption—while maintaining or surpassing performance in action recognition, active speaker localization, and behavior prediction.

Technology Category

Application Category

📝 Abstract

Modern perception models, particularly those designed for multisensory egocentric tasks, have achieved remarkable performance but often come with substantial computational costs. These high demands pose challenges for real-world deployment, especially in resource-constrained environments. In this paper, we introduce EgoAdapt, a framework that adaptively performs cross-modal distillation and policy learning to enable efficient inference across different egocentric perception tasks, including egocentric action recognition, active speaker localization, and behavior anticipation. Our proposed policy module is adaptable to task-specific action spaces, making it broadly applicable. Experimental results on three challenging egocentric datasets EPIC-Kitchens, EasyCom, and Aria Everyday Activities demonstrate that our method significantly enhances efficiency, reducing GMACs by up to 89.09%, parameters up to 82.02%, and energy up to 9.6x, while still on-par and in many cases outperforming, the performance of corresponding state-of-the-art models.

Problem

Research questions and friction points this paper is trying to address.

Reduces computational costs in egocentric perception models

Adapts cross-modal distillation for efficient multisensory tasks

Improves efficiency without sacrificing performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive cross-modal distillation for efficiency

Task-specific policy learning module

Significant reduction in computational resources

🔎 Similar Papers

No similar papers found.