Fraesormer: Learning Adaptive Sparse Transformer for Efficient Food Recognition

📅 2025-03-15

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

To address the quadratic computational complexity, redundant token interactions, and static single-scale representations inherent in Transformers for lightweight food recognition, this paper proposes an adaptive sparse architecture. Methodologically, it introduces (1) Adaptive Top-k Sparse Partial Attention (ATK-SPA), which employs a Gated Dynamic Top-K Operator (GDTKO) to perform computation-aware token selection; and (2) a Hierarchical Scale-Sensitive Feature Gating Network (HSSFGN), integrating partial-channel mechanisms with gated multi-scale feature aggregation to capture the unstructured and multi-scale characteristics of food images. Evaluated on multiple food recognition benchmarks, the proposed method achieves significant improvements over state-of-the-art approaches—delivering higher accuracy while requiring fewer parameters and lower computational overhead—thus striking an effective balance between precision and inference efficiency.

Technology Category

Application Category

📝 Abstract

In recent years, Transformer has witnessed significant progress in food recognition. However, most existing approaches still face two critical challenges in lightweight food recognition: (1) the quadratic complexity and redundant feature representation from interactions with irrelevant tokens; (2) static feature recognition and single-scale representation, which overlook the unstructured, non-fixed nature of food images and the need for multi-scale features. To address these, we propose an adaptive and efficient sparse Transformer architecture (Fraesormer) with two core designs: Adaptive Top-k Sparse Partial Attention (ATK-SPA) and Hierarchical Scale-Sensitive Feature Gating Network (HSSFGN). ATK-SPA uses a learnable Gated Dynamic Top-K Operator (GDTKO) to retain critical attention scores, filtering low query-key matches that hinder feature aggregation. It also introduces a partial channel mechanism to reduce redundancy and promote expert information flow, enabling local-global collaborative modeling. HSSFGN employs gating mechanism to achieve multi-scale feature representation, enhancing contextual semantic information. Extensive experiments show that Fraesormer outperforms state-of-the-art methods. code is available at https://zs1314.github.io/Fraesormer.

Problem

Research questions and friction points this paper is trying to address.

Reduces quadratic complexity and redundant feature representation in food recognition.

Addresses static feature recognition and single-scale representation limitations.

Enables adaptive, efficient sparse Transformer for multi-scale food image analysis.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Top-k Sparse Partial Attention

Hierarchical Scale-Sensitive Feature Gating

Local-global collaborative modeling

🔎 Similar Papers

No similar papers found.