🤖 AI Summary
In texture recognition, the spatial randomness of texture primitives hinders effective non-local contextual modeling. To address this, we propose a graph-enhanced multi-scale framework that jointly captures local details and global structure. First, a fully connected graph is constructed to explicitly model global pixel- or region-level dependencies. Second, a bipartite graph architecture is designed to encode cross-scale feature interactions. Third, an unordered multi-scale patch encoding module—based on a learnable codebook—is introduced to achieve scale-invariant texture representation. Our method seamlessly integrates graph neural networks (via the fully connected and bipartite graphs), CNN-based feature extraction, and multi-scale aggregation. Evaluated on five standard texture benchmarks, it consistently outperforms state-of-the-art approaches, delivering significant improvements in classification accuracy and robustness against geometric deformations and noise. This work establishes a novel paradigm for holistic texture understanding.
📝 Abstract
Texture recognition is a fundamental problem in computer vision and pattern recognition. Recent progress leverages feature aggregation into discriminative descriptions based on convolutional neural networks (CNNs). However, modeling non-local context relations through visual primitives remains challenging due to the variability and randomness of texture primitives in spatial distributions. In this paper, we propose a graph-enhanced texture encoding network (GraphTEN) designed to capture both local and global features of texture primitives. GraphTEN models global associations through fully connected graphs and captures cross-scale dependencies of texture primitives via bipartite graphs. Additionally, we introduce a patch encoding module that utilizes a codebook to achieve an orderless representation of texture by encoding multi-scale patch features into a unified feature space. The proposed GraphTEN achieves superior performance compared to state-of-the-art methods across five publicly available datasets.