Boltzmann Attention Sampling for Image Analysis with Small Objects

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Small objects—such as pulmonary nodules and tumor lesions—often occupy less than 0.1% of an image, leading to inefficiency and poor localization in standard Transformers due to redundant global attention. Existing sparse attention methods rely on fixed hierarchical structures, which lack adaptability to the highly variable positions and uncertain scales of small targets. To address this, we propose a dynamic sparse attention mechanism: (i) introducing uncertainty modeling based on an annealed Boltzmann distribution for temperature-controlled, progressive attention focusing; and (ii) eliminating rigid hierarchical constraints to enable sub-pixel-level target localization. Our method is modularly integrated into Transformer architectures. Evaluated on multiple small-object segmentation benchmarks, it consistently outperforms state-of-the-art methods, achieving substantial improvements in segmentation accuracy while reducing attention computation by an order of magnitude.

Technology Category

Application Category

📝 Abstract
Detecting and segmenting small objects, such as lung nodules and tumor lesions, remains a critical challenge in image analysis. These objects often occupy less than 0.1% of an image, making traditional transformer architectures inefficient and prone to performance degradation due to redundant attention computations on irrelevant regions. Existing sparse attention mechanisms rely on rigid hierarchical structures, which are poorly suited for detecting small, variable, and uncertain object locations. In this paper, we propose BoltzFormer, a novel transformer-based architecture designed to address these challenges through dynamic sparse attention. BoltzFormer identifies and focuses attention on relevant areas by modeling uncertainty using a Boltzmann distribution with an annealing schedule. Initially, a higher temperature allows broader area sampling in early layers, when object location uncertainty is greatest. As the temperature decreases in later layers, attention becomes more focused, enhancing efficiency and accuracy. BoltzFormer seamlessly integrates into existing transformer architectures via a modular Boltzmann attention sampling mechanism. Comprehensive evaluations on benchmark datasets demonstrate that BoltzFormer significantly improves segmentation performance for small objects while reducing attention computation by an order of magnitude compared to previous state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Detecting and segmenting small objects in images
Inefficiency of traditional transformers for small objects
Dynamic sparse attention for improved accuracy and efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic sparse attention for small object detection
Boltzmann distribution models uncertainty in attention
Modular integration into existing transformer architectures
🔎 Similar Papers
No similar papers found.