HiddenObject: Modality-Agnostic Fusion for Multimodal Hidden Object Detection

πŸ“… 2025-08-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Detecting concealed or partially occluded objects remains challenging under adverse conditions such as occlusion, camouflage, and illumination variation. To address this, we propose a Mamba-based multimodal detection framework that jointly processes RGB, thermal infrared, and depth modalities. Our method introduces a modality-agnostic cross-modal feature extraction and fusion mechanism, leveraging state space models to capture long-range dependencies and enable unified, adaptive representation of complementary multimodal cues. Unlike conventional single-modality or shallow-fusion approaches, our architecture is fully end-to-end trainable. Extensive experiments on multiple public benchmarks demonstrate state-of-the-art or leading performance, with significant improvements in detection robustness and generalization under complex visual conditions.

Technology Category

Application Category

πŸ“ Abstract
Detecting hidden or partially concealed objects remains a fundamental challenge in multimodal environments, where factors like occlusion, camouflage, and lighting variations significantly hinder performance. Traditional RGB-based detection methods often fail under such adverse conditions, motivating the need for more robust, modality-agnostic approaches. In this work, we present HiddenObject, a fusion framework that integrates RGB, thermal, and depth data using a Mamba-based fusion mechanism. Our method captures complementary signals across modalities, enabling enhanced detection of obscured or camouflaged targets. Specifically, the proposed approach identifies modality-specific features and fuses them in a unified representation that generalizes well across challenging scenarios. We validate HiddenObject across multiple benchmark datasets, demonstrating state-of-the-art or competitive performance compared to existing methods. These results highlight the efficacy of our fusion design and expose key limitations in current unimodal and naΓ―ve fusion strategies. More broadly, our findings suggest that Mamba-based fusion architectures can significantly advance the field of multimodal object detection, especially under visually degraded or complex conditions.
Problem

Research questions and friction points this paper is trying to address.

Detecting hidden or partially concealed objects in multimodal environments
Overcoming performance hindrances from occlusion, camouflage, and lighting variations
Addressing limitations of traditional RGB-based detection methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba-based fusion mechanism integrates RGB, thermal, depth
Modality-agnostic framework captures complementary signals across modalities
Unified representation generalizes across challenging detection scenarios
πŸ”Ž Similar Papers
No similar papers found.
H
Harris Song
Department of Computer Science at the University of California, Los Angeles
T
Tuan-Anh Vu
Department of Mechanical & Aerospace Engineering at the University of California, Los Angeles
S
Sanjith Menon
Department of Mechanical & Aerospace Engineering at the University of California, Los Angeles
Sriram Narasimhan
Sriram Narasimhan
Department of Mechanical & Aerospace Engineering at the University of California, Los Angeles
M. Khalid Jawed
M. Khalid Jawed
UCLA (Structures-Computer Interaction Lab)
Solid and structural mechanicsroboticsphysics-assisted machine learning