🤖 AI Summary
Existing multi-label classification methods are largely confined to local or single-scale geometric modeling, failing to capture cross-scale contextual interactions among objects. To address this, we propose a fine-grained anchor-driven dynamic context modeling framework. Our approach uniquely integrates random walks with multi-head attention to explicitly model multi-order geometric neighborhood relationships. Hierarchical cross-scale feature aggregation is performed in Hilbert space, and a cascaded fusion architecture enables joint perception of multi-order and cross-scale dependencies. The method requires no additional annotations and is fully end-to-end trainable. Extensive experiments on NUS-WIDE, PASCAL VOC2007, and MS-COCO demonstrate consistent superiority over state-of-the-art methods, achieving significant mAP improvements—particularly enhancing discriminability for fine-grained semantic labels.
📝 Abstract
Context modeling is crucial for visual recognition, enabling highly discriminative image representations by integrating both intrinsic and extrinsic relationships between objects and labels in images. A limitation in current approaches is their focus on basic geometric relationships or localized features, often neglecting cross-scale contextual interactions between objects. This paper introduces the Deep Panoptic Context Aggregation Network (PanCAN), a novel approach that hierarchically integrates multi-order geometric contexts through cross-scale feature aggregation in a high-dimensional Hilbert space. Specifically, PanCAN learns multi-order neighborhood relationships at each scale by combining random walks with an attention mechanism. Modules from different scales are cascaded, where salient anchors at a finer scale are selected and their neighborhood features are dynamically fused via attention. This enables effective cross-scale modeling that significantly enhances complex scene understanding by combining multi-order and cross-scale context-aware features. Extensive multi-label classification experiments on NUS-WIDE, PASCAL VOC2007, and MS-COCO benchmarks demonstrate that PanCAN consistently achieves competitive results, outperforming state-of-the-art techniques in both quantitative and qualitative evaluations, thereby substantially improving multi-label classification performance.