Continual Multimodal Egocentric Activity Recognition via Modality-Aware Novel Detection

📅 2026-03-17
📈 Citations: 0
âœĻ Influential: 0
📄 PDF
ðŸĪ– AI Summary
This work addresses the challenge of open-world multimodal first-person activity recognition, where existing methods struggle to detect novel activities and suffer from catastrophic forgetting due to overreliance on the RGB modality. To overcome these limitations, the authors propose the MAND framework, which introduces Modality-aware Adaptive Scoring (MoAS) during inference to enhance novel activity detection by fusing energy scores across modalities. During training, MAND employs a Modality-level Representation Stability Training strategy (MoRST), combining auxiliary heads with modality-level logit distillation to preserve stable representations for each modality. Experiments on public benchmarks demonstrate that MAND achieves up to a 10% improvement in AUC for novel activity detection and a 2.8% gain in accuracy on known classes, significantly outperforming current continual learning approaches.

Technology Category

Application Category

📝 Abstract
Multimodal egocentric activity recognition integrates visual and inertial cues for robust first-person behavior understanding. However, deploying such systems in open-world environments requires detecting novel activities while continuously learning from non-stationary streams. Existing methods rely on the main logits for novelty scoring, without fully exploiting the complementary evidence available from individual modalities. Because these logits are often dominated by RGB, cues from other modalities, particularly IMU, remain underutilized, and this imbalance worsens over time under catastrophic forgetting. To address this, we propose MAND, a modality-aware framework for multimodal egocentric open-world continual learning. At inference, Modality-aware Adaptive Scoring (MoAS) estimates sample-wise modality reliability from energy scores and adaptively integrates modality logits to better exploit complementary modality cues for novelty detection. During training, Modality-wise Representation Stabilization Training (MoRST) preserves modality-specific discriminability across tasks via auxiliary heads and modality-wise logit distillation. Experiments on a public multimodal egocentric benchmark show that MAND improves novel activity detection AUC by up to 10\% and known-class classification accuracy by up to 2.8\% over state-of-the-art baselines.
Problem

Research questions and friction points this paper is trying to address.

Continual Learning
Multimodal Egocentric Activity Recognition
Novelty Detection
Open-World Learning
Catastrophic Forgetting
Innovation

Methods, ideas, or system contributions that make the work stand out.

modality-aware
novelty detection
continual learning
egocentric activity recognition
multimodal fusion
🔎 Similar Papers
No similar papers found.
W
Wonseon Lim
School of Computer Science and Engineering, Chung-Ang University
H
Hyejeong Im
School of Computer Science and Engineering, Chung-Ang University
Dae-Won Kim
Dae-Won Kim
ETRI (한ęĩ­ė „ėží†ĩė‹ ė—°ęĩŽė›)
Machine LearningStatistical Data AnalysisTime Series AnalysisBig Data