π€ AI Summary
This work addresses the limitations of existing superpixel methods, which produce irregular regions that are misaligned with regular operators such as convolutions, thereby hindering parallel computation and end-to-end deep learning. To overcome this, the study introduces granular ball computing into superpixel generation for the first time, proposing a structured superpixel representation based on multi-scale square blocks. By evaluating pixel intensity similarity to compute purity scores, the method adaptively selects high-quality square blocks for image coverage. This formulation inherently supports efficient parallel processing and integrates seamlessly into graph neural networks (GNNs) or Vision Transformers (ViTs) for end-to-end training. Experiments across multiple downstream vision tasks demonstrate that the proposed square superpixels significantly enhance performance, validating their advantages in both structured representation and computational efficiency.
π Abstract
Superpixels provide a compact region-based representation that preserves object boundaries and local structures, and have therefore been widely used in a variety of vision tasks to reduce computational cost. However, most existing superpixel algorithms produce irregularly shaped regions, which are not well aligned with regular operators such as convolutions. Consequently, superpixels are often treated as an offline preprocessing step, limiting parallel implementation and hindering end-to-end optimization within deep learning pipelines. Motivated by the adaptive representation and coverage property of granular-ball computing, we develop a square superpixel generation approach. Specifically, we approximate superpixels using multi-scale square blocks to avoid the computational and implementation difficulties induced by irregular shapes, enabling efficient parallel processing and learnable feature extraction. For each block, a purity score is computed based on pixel-intensity similarity, and high-quality blocks are selected accordingly. The resulting square superpixels can be readily integrated as graph nodes in graph neural networks (GNNs) or as tokens in Vision Transformers (ViTs), facilitating multi-scale information aggregation and structured visual representation. Experimental results on downstream tasks demonstrate consistent performance improvements, validating the effectiveness of the proposed method.