HiddenObject: Modality-Agnostic Fusion for Multimodal Hidden Object Detection

📅 2025-08-28

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Detecting concealed or partially occluded objects remains challenging under adverse conditions such as occlusion, camouflage, and illumination variation. To address this, we propose a Mamba-based multimodal detection framework that jointly processes RGB, thermal infrared, and depth modalities. Our method introduces a modality-agnostic cross-modal feature extraction and fusion mechanism, leveraging state space models to capture long-range dependencies and enable unified, adaptive representation of complementary multimodal cues. Unlike conventional single-modality or shallow-fusion approaches, our architecture is fully end-to-end trainable. Extensive experiments on multiple public benchmarks demonstrate state-of-the-art or leading performance, with significant improvements in detection robustness and generalization under complex visual conditions.

Technology Category

Application Category

📝 Abstract

Detecting hidden or partially concealed objects remains a fundamental challenge in multimodal environments, where factors like occlusion, camouflage, and lighting variations significantly hinder performance. Traditional RGB-based detection methods often fail under such adverse conditions, motivating the need for more robust, modality-agnostic approaches. In this work, we present HiddenObject, a fusion framework that integrates RGB, thermal, and depth data using a Mamba-based fusion mechanism. Our method captures complementary signals across modalities, enabling enhanced detection of obscured or camouflaged targets. Specifically, the proposed approach identifies modality-specific features and fuses them in a unified representation that generalizes well across challenging scenarios. We validate HiddenObject across multiple benchmark datasets, demonstrating state-of-the-art or competitive performance compared to existing methods. These results highlight the efficacy of our fusion design and expose key limitations in current unimodal and naïve fusion strategies. More broadly, our findings suggest that Mamba-based fusion architectures can significantly advance the field of multimodal object detection, especially under visually degraded or complex conditions.

Problem

Research questions and friction points this paper is trying to address.

Detecting hidden or partially concealed objects in multimodal environments

Overcoming performance hindrances from occlusion, camouflage, and lighting variations

Addressing limitations of traditional RGB-based detection methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba-based fusion mechanism integrates RGB, thermal, depth

Modality-agnostic framework captures complementary signals across modalities

Unified representation generalizes across challenging detection scenarios

🔎 Similar Papers

No similar papers found.